1.5. Handson-Tutorial(v2.0.3)
This tutorial will introduce you to the basic usage of the DeePMD-kit, taking a gas phase methane molecule as an example. Typically the DeePMD-kit workflow contains three parts:
Data preparation
Training/Freezing/Compressing/Testing
Molecular dynamics
The DP model is generated using the DeePMD-kit package (v2.0.3). The training data is converted into the format of DeePMD-kit using a tool named dpdata (v0.2.5). It needs to be noted that dpdata only works with Python 3.5 and later versions. The MD simulations are carried out using LAMMPS (29 Sep 2021) integrated with DeePMD-kit. Details of dpdata and DeePMD-kit installation and execution of can be found in the DeepModeling official GitHub site. OVITO is used for the visualization of the MD trajectory.
The files needed for this tutorial are available.
$ wget https://dp-public.oss-cn-beijing.aliyuncs.com/community/CH4.tar
$ tar xvf CH4.tar
The folder structure of this tutorial is like this:
$ cd CH4
$ ls
00.data 01.train 02.lmp
There are 3 folders here:
The folder 00.data contains the data
The folder 01.train contains an example input script to train a model with DeePMD-kit
The folder 02.lmp contains LAMMPS example script for molecular dynamics simulation
1.5.1. Data preparation
The training data of the DeePMD-kit contains the atom type, the simulation box, the atom coordinate, the atom force, the system energy, and the virial. A snapshot of a molecular system that has this information is called a frame. A system of data includes many frames that share the same number of atoms and atom types. For example, a molecular dynamics trajectory can be converted into a system of data, with each time step corresponding to a frame in the system.
The DeePMD-kit adopts a compressed data format. All training data should first be converted into this format and can then be used by DeePMD-kit. The data format is explained in detail in the DeePMD-kit manual that can be found in the DeePMD-kit official Github site .
We provide a convenient tool named dpdata for converting the data produced by VASP, Gaussian, Quantum-Espresso, ABACUS, and LAMMPS into the compressed format of DeePMD-kit.
As an example, go to the data folder:
$ cd 00.data
$ ls
OUTCAR
The OUTCAR was produced by an ab-initio molecular dynamics (AIMD) simulation of a gas phase methane molecule using VASP. Now start an interactive python environment, for example
$ python
then execute the following commands:
import dpdata
import numpy as np
data = dpdata.LabeledSystem('OUTCAR', fmt = 'vasp/outcar')
print('# the data contains %d frames' % len(data))
On the screen, you can see that the OUTCAR file contains 200 frames of data. We randomly pick 40 frames as validation data and the rest as training data.
index_validation = np.random.choice(200,size=40,replace=False) # random choose 40 index for validation_data
index_training = list(set(range(200))-set(index_validation)) # other indexes are training_data
data_training = data.sub_system(index_training)
data_validation = data.sub_system(index_validation)
data_training.to_deepmd_npy('training_data') # all training data put into directory:"training_data"
data_validation.to_deepmd_npy('validation_data') # all validation data put into directory:"validation_data"
print('# the training data contains %d frames' % len(data_training))
print('# the validation data contains %d frames' % len(data_validation))
The commands import a system of data from the OUTCAR (with format vasp/outcar ), and then dump it into the compressed format (numpy compressed arrays). The data in DeePMD-kit format is stored in the folder 00.data. Lets have a look:
$ ls
OUTCAR training_data validation_data
The directories “training_data” and “validation_data” have similar structure, so we just explain “training_data”:
$ ls training_data
set.000 type.raw type_map.raw
set.000: is a directory, contains data in compressed format (numpy compressed arrays).
type.raw: is a file, contains types of atoms(Represented in integer)
type_map.raw: is a file, contains type name of atoms.
Lets have a look at type.raw
:
$ cat training_data/type.raw
0 0 0 0 1
This tells us there are 5 atoms in this example, 4 atoms represented by type “0”, and 1 atom represented by type “1”.
Sometimes one needs to map the integer types to atom name. The mapping can be given by the file type_map.raw
$ cat training_data/type_map.raw
H C
This tells us the type “0” is named by “H”, and the type “1” is named by “C”.
More detailed doc about Data conversion can be found here
1.5.2. Training
1.5.2.1. Prepare input script
Once the data preparation is done, we can go on with training. Now go to the training directory
$ cd ../01.train
$ ls
input.json
where input.json gives you an example training script. The options are explained in detail in the DeePMD-kit manual, so they are not comprehensively explained.
In the model section, the parameters of embedding and fitting networks are specified.
"model":{
"type_map": ["H", "C"], # the name of each type of atom
"descriptor":{
"type": "se_e2_a", # full relative coordinates are used
"rcut": 6.00, # cut-off radius
"rcut_smth": 0.50, # where the smoothing starts
"sel": [4, 1], # the maximum number of type i atoms in the cut-off radius
"neuron": [10, 20, 40], # size of the embedding neural network
"resnet_dt": false,
"axis_neuron": 4, # the size of the submatrix of G (embedding matrix)
"seed": 1,
"_comment": "that's all"
},
"fitting_net":{
"neuron": [100, 100, 100], # size of the fitting neural network
"resnet_dt": true,
"seed": 1,
"_comment": "that's all"
},
"_comment": "that's all"'
},
The se\_e2\_a
descriptor is used to train the DP model. The item neurons set the size of the embedding and fitting network to [10, 20, 40] and [100, 100, 100], respectively. The components in to smoothly go to zero from 0.5 to 6 Å.
The following are the parameters that specify the learning rate and loss function.
"learning_rate" :{
"type": "exp",
"decay_steps": 5000,
"start_lr": 0.001,
"stop_lr": 3.51e-8,
"_comment": "that's all"
},
"loss" :{
"type": "ener",
"start_pref_e": 0.02,
"limit_pref_e": 1,
"start_pref_f": 1000,
"limit_pref_f": 1,
"start_pref_v": 0,
"limit_pref_v": 0,
"_comment": "that's all"
},
In the loss function, pref\_e
increases from 0.02 to 1 , and pref\_f
decreases from 1000 to 1 progressively, which means that the force term dominates at the beginning, while energy and virial terms become important at the end. This strategy is very effective and reduces the total training time. pref_v
is set to 0 , indicating that no virial data are included in the training process. The starting learning rate, stop learning rate, and decay steps are set to 0.001, 3.51e-8, and 5000, respectively. The model is trained for steps.
The training parameters are given in the following
"training" : {
"training_data": {
"systems": ["../00.data/training_data"],
"batch_size": "auto",
"_comment": "that's all"
},
"validation_data":{
"systems": ["../00.data/validation_data/"],
"batch_size": "auto",
"numb_btch": 1,
"_comment": "that's all"
},
"numb_steps": 100000,
"seed": 10,
"disp_file": "lcurve.out",
"disp_freq": 1000,
"save_freq": 10000,
},
1.5.2.2. Train a model
After the training script is prepared, we can start the training with DeePMD-kit by simply running
$ dp train input.json
On the screen, you see the information of the data system(s)
DEEPMD INFO ----------------------------------------------------------------------------------------------------
DEEPMD INFO ---Summary of DataSystem: training -------------------------------------------------------------
DEEPMD INFO found 1 system(s):
DEEPMD INFO system natoms bch_sz n_bch prob pbc
DEEPMD INFO ../00.data/training_data/ 5 7 22 1.000 T
DEEPMD INFO -----------------------------------------------------------------------------------------------------
DEEPMD INFO ---Summary of DataSystem: validation --------------------------------------------------------------
DEEPMD INFO found 1 system(s):
DEEPMD INFO system natoms bch_sz n_bch prob pbc
DEEPMD INFO ../00.data/validation_data/ 5 7 5 1.000 T
and the starting and final learning rate of this training
DEEPMD INFO start training at lr 1.00e-03 (== 1.00e-03), decay_step 5000, decay_rate 0.950006, final lr will be 3.51e-08
If everything works fine, you will see, on the screen, information printed every 1000 steps, like
DEEPMD INFO batch 1000 training time 7.61 s, testing time 0.01 s
DEEPMD INFO batch 2000 training time 6.46 s, testing time 0.01 s
DEEPMD INFO batch 3000 training time 6.50 s, testing time 0.01 s
DEEPMD INFO batch 4000 training time 6.44 s, testing time 0.01 s
DEEPMD INFO batch 5000 training time 6.49 s, testing time 0.01 s
DEEPMD INFO batch 6000 training time 6.46 s, testing time 0.01 s
DEEPMD INFO batch 7000 training time 6.24 s, testing time 0.01 s
DEEPMD INFO batch 8000 training time 6.39 s, testing time 0.01 s
DEEPMD INFO batch 9000 training time 6.72 s, testing time 0.01 s
DEEPMD INFO batch 10000 training time 6.41 s, testing time 0.01 s
DEEPMD INFO saved checkpoint model.ckpt
They present the training and testing time counts. At the end of the 10000th batch, the model is saved in Tensorflow’s checkpoint file model.ckpt
. At the same time, the training and testing errors are presented in file lcurve.out
.
The file contains 8 columns, form left to right, are the training step, the validation loss, training loss, root mean square (RMS) validation error of energy, RMS training error of energy, RMS validation error of force, RMS training error of force and the learning rate. The RMS error (RMSE) of the energy is normalized by number of atoms in the system.
$ head -n 2 lcurve.out
#step rmse_val rmse_trn rmse_e_val rmse_e_trn rmse_f_val rmse_f_trn lr
0 1.34e+01 1.47e+01 7.05e-01 7.05e-01 4.22e-01 4.65e-01 1.00e-03
and
$ tail -n 2 lcurve.out
999000 1.24e-01 1.12e-01 5.93e-04 8.15e-04 1.22e-01 1.10e-01 3.7e-08
1000000 1.31e-01 1.04e-01 3.52e-04 7.74e-04 1.29e-01 1.02e-01 3.5e-08
Volumes 4, 5 and 6, 7 present energy and force training and testing errors, respectively. It is demonstrated that after 140,000 steps of training, the energy testing error is less than 1 meV and the force testing error is around 120 meV/Å. It is also observed that the force testing error is systematically (but slightly) larger than the training error, which implies a slight over-fitting to the rather small dataset.
One can visualize this file by a simple Python script:
import numpy as np
import matplotlib.pyplot as plt
data = np.genfromtxt("lcurve.out", names=True)
for name in data.dtype.names[1:-1]:
plt.plot(data['step'], data[name], label=name)
plt.legend()
plt.xlabel('Step')
plt.ylabel('Loss')
plt.xscale('symlog')
plt.yscale('log')
plt.grid()
plt.show()
When the training process is stopped abnormally, we can restart the training from the provided checkpoint by simply running
$ dp train --restart model.ckpt input.json
In the lcurve.out
, you can see the training and testing errors, like
538000 3.12e-01 2.16e-01 6.84e-04 7.52e-04 1.38e-01 9.52e-02 4.1e-06
538000 3.12e-01 2.16e-01 6.84e-04 7.52e-04 1.38e-01 9.52e-02 4.1e-06
539000 3.37e-01 2.61e-01 7.08e-04 3.38e-04 1.49e-01 1.15e-01 4.1e-06
#step rmse_val rmse_trn rmse_e_val rmse_e_trn rmse_f_val rmse_f_trn lr
530000 2.89e-01 2.15e-01 6.36e-04 5.18e-04 1.25e-01 9.31e-02 4.4e-06
531000 3.46e-01 3.26e-01 4.62e-04 6.73e-04 1.49e-01 1.41e-01 4.4e-06
Note that input.json
needs to be consistent with the previous one.
1.5.2.3. Freeze and Compress a model
At the end of the training, the model parameters saved in TensorFlow’s checkpoint file should be frozen as a model file that is usually ended with extension .pb. Simply execute
$ dp freeze -o graph.pb
DEEPMD INFO Restoring parameters from ./model.ckpt-1000000
DEEPMD INFO 1264 ops in the final graph
and it will output a model file named graph.pb
in the current directory.
The compressed DP model typically speed up DP-based calculations by an order of magnitude faster, and consume an order of magnitude less memory. The graph.pb
can be compressed in the following way:
$ dp compress -i graph.pb -o graph-compress.pb
DEEPMD INFO stage 1: compress the model
DEEPMD INFO built lr
DEEPMD INFO built network
DEEPMD INFO built training
DEEPMD INFO initialize model from scratch
DEEPMD INFO finished compressing
DEEPMD INFO
DEEPMD INFO stage 2: freeze the model
DEEPMD INFO Restoring parameters from model-compression/model.ckpt
DEEPMD INFO 840 ops in the final graph
and it will output a model file named graph-compress.pb
.
1.5.2.4. Test a model
We can check the quality of the trained model by running
$ dp test -m graph-compress.pb -s ../00.data/validation_data -n 40 -d results
On the screen you see the information of the prediction errors of validation data
DEEPMD INFO # number of test data : 40
DEEPMD INFO Energy RMSE : 3.168050e-03 eV
DEEPMD INFO Energy RMSE/Natoms : 6.336099e-04 eV
DEEPMD INFO Force RMSE : 1.267645e-01 eV/A
DEEPMD INFO Virial RMSE : 2.494163e-01 eV
DEEPMD INFO Virial RMSE/Natoms : 4.988326e-02 eV
DEEPMD INFO # -----------------------------------------------
and it will output files named results.e.out and results.f.out in the current directory.
1.5.3. Run MD with LAMMPS
Now let’s switch to the lammps directory to check the necessary input files for running DeePMD with LAMMPS.
$ cd ../02.lmp
Firstly, we soft-link the output model in the training directory to the current directory
$ ln -s ../01.train/graph-compress.pb
Then we have three files
$ ls
conf.lmp graph-compress.pb in.lammps
where conf.lmp
gives the initial configuration of a gas phase methane MD simulation, and the file in.lammps
is the lammps input script. One may check in.lammps and finds that it is a rather standard LAMMPS input file for a MD simulation, with only two exception lines:
pair_style graph-compress.pb
pair_coeff * *
where the pair style deepmd is invoked and the model file graph-compress.pb
is provided, which means the atomic interaction will be computed by the DP model that is stored in the file graph-compress.pb.
One may execute lammps in the standard way
$ lmp -i in.lammps
After waiting for a while, the MD simulation finishes, and the log.lammps and ch4.dump files are generated. They store thermodynamic information and the trajectory of the molecule, respectively. One may want to visualize the trajectory by, e.g. OVITO
$ ovito ch4.dump
to check the evolution of the molecular configuration.