1.3. Theory

Before introducing the DP method, we define the coordinate matrix of a system containing atoms,

contains 3 Cartesian coordinates of atom and can be transformed into local environment matrices ,

where and are indexes of neighbors of atom within the cut-off radius , and is defined as the relative coordinate.

In the DP method, the total energy of a system is constructed as a sum of atomic energies.

with being the local atomic energy of the atom . depends on the local environment of the atom :

The mapping of to is constructed in two steps. As seen in figure, is first mapped to a feature matrix, also called the descriptor, to preserve the translational, rotational, and permutational symmetries of the system. is first transformed into generalized coordinate .

where , , and . is a weighting function to reduce the weight of particles that are more distant from the atom , defined as:

here is the Euclidean distance between atoms and , and is the smooth cutoff parameter. By introducing the components in smoothly go to zero from to . Then , i.e. the first column of , is mapped to a embedding matrix , through an embedding neural network. By taking the first columns of , we obtain another embedding matrix . Finally, we define the feature matrix of atom :

In feature_matrix, translational and rotational symmetries are preserved by the matrix product of , and permutational symmetry is preserved by the matrix product of . Next, each is mapped to a local atomic energy through a fitting network.

Both the embedding network and fitting network are feed-forward neural networks containing several hidden layers. The mapping from input data of the previous layer to output data of the next layeris composed of a linear and a non-linear transformation.

In Eq.(8), is the connecting weight, the bias weight, and is a non-linear activation function. It needs to be noted that only linear transformations are applied at the output nodes. The parameters contained in the embedding and fitting networks are obtained by minimizing the loss function :

where , , and denote root mean square error (RMSE) in energy, force, and virial, respectively. During the training process, the prefactors , , and are determined by

where and are the learning rate at training step and training step 0. is defined as

where and are the decay rate and decay steps, respectively. The decay rate is required to be less than 1. The reader is referred to the original papers of DeepPot-SE (DP) method for details.