Machine Learning for Chemical Elements and Crystal Structures

The goal of crystal structure prediction is to predict and discover the stable or metastable crystal state of a material with a given chemical composition. This prediction task is a critical and unavoidable part of the entire computer-aided design workflow for inorganic solids.

Introduction

12] calculated the lattice thermal conductivities of 101 inorganic compounds using first-principles calculations, and combined Bayesian optimization [13] and Gaussian process regression [14]. Using this model, they screened 54,779 compounds in the Materials Project [15] and identified 221 compounds with low thermal conductivity.

Crystal structure databases

18] synthesized a new polymer with high thermal conductivity by deriving a thermophysical property prediction model using the PoLyInfo polymer property database [19]. A virtual library was generated using a Bayesian molecular generation algorithm and three polymers predicted to have high thermal conductivity were selected for experimental validation.

Representation of inorganic materials

Property prediction

This machine learning forward mapping corresponds to the forward arrow labeled “ML (Machine Learning)” in the figure. Feature prediction using ML is achieved by combining representation methods (as described in the previous section) and supervised ML models such as ridge kernel regression, random forest, boosting methods, and neural network frameworks [50].

Inverse design strategies

Property prediction involves predicting the properties of materials using compositional features or structures of materials as input. Such an ML-assisted method provides rapid prediction of materials, enabling large-scale computational screening of materials data.

Crystal structure prediction

In Chapter 4, we propose a new CSP framework that performs the prediction task by selecting crystal structures that are predicted to be similar to the stable structure of a given chemical compound from existing crystal structures in a database. The great potential of the present method is demonstrated through the prediction of a wide variety of crystal systems in Chapter 4.

Dimension reduction and visualization of materials data

The method is classified under element substitution-based methods, where the interchangeability of two chemical elements is statistically estimated based on the observed frequency of their occurrence in two similar crystal structures.

Visualization of chemical elements as a periodic table

However, the machine learning algorithms never achieved Mendeleev's performance, as they missed important features such as between-group and between-family similarities. The PTG is an unsupervised machine learning algorithm based on the GTM that can automate the translation of high-dimensional data into a tabular form with different layouts on demand.

GTM

PTG can autonomously generate different arrangements of chemical symbols that organize a two-dimensional array, such as Mendeleev's periodic table or a three-dimensional spiral table, according to the underlying periodicity in the given data. Since PTG can be considered as an extension of GTM, the details of GTM are discussed in the next section.

GTM-LDLV

C+ as diagonal element, 𝚲2 is a diagonal matrix containing 𝑔(𝒖B) - as diagonal element, and 𝒙(|) is a vector of the 𝑑th entry in matrix 𝑿i. An overview of the learning algorithm for GTM-LDLV for parameter estimation is shown in Algorithm 2.1.

Introduction

As briefly explained earlier, we aim here to answer whether machine learning can reproduce or recreate the periodic table using observed physico-chemical properties of the elements. Visualization results of the element data on the two-dimensional space using PCA (upper left), t-SNE (upper right), ISOMAP with neighbors = 3 (middle left), LLE with neighbors = 9 (middle right), kernel PCA with ANOVA kernel and sigma = 0.2 (bottom left), and GTM with 𝐾 = 9 × 9 grid points and 16 basis functions (bottom right).

Methods

Computational workflow
Interpretation
Periodic table as an element descriptor
Data: element features
Analysis procedure

The prior distribution of the 𝑑th input ℎ|(𝒖) of 𝒉(𝒖) is given as a GP with mean 0 and covariance function 𝑐}(𝒖w, 𝒖x). 3.4) In equation (3.3), the hyperparameter 𝝃2 consists of 𝜈2 and 𝑙2, also called the variance and the length scale. By taking the ensemble average over the samples from 𝑝(𝜽|𝑿), the parameters of the GTM-LDLV are estimated. Using the parameters currently given, we can derive the values of 𝑟(𝒖) of the covariance function in equation (3.4) at the extended nodes, 𝒖+, ⋯ , 𝒖0.

The periodic table was then quantitatively evaluated based on the predictive performance of the descriptor.

Results

Results of PTG

In the second step, we increased the number of nodes to 9 × 9 by placing the new nodes at the midpoints of the line segments connecting each node. The PTG algorithm was implemented using the R codes available at [105] with a set of data elements. The elements are color coded according to the five periods and numbered according to their atomic number.

The elements are color-coded according to three blocks in the standard periodic table indicated in the right panel.

Interpretation

The red line on the left indicates the segment between transitional elements and typical elements. Property landscapes with atomic radius (Rahm et al. [106]), electron negativity, melting point and thermal conductivity at 25℃ embedded in the latent spaces.

Quantitative comparison of periodic tables

Per-atom formation energy prediction performance for the models using six different descriptors. The vertical axis shows the cross-validated MAE and RMSE of RF regressors trained with six different descriptors obtained from the element coordinates in the representation made by t-SNE and PCA (corresponding to top-left and top-right in Fig. 3.2, respectively), the standard periodic table, the square PTG table, the conical PTG table, and the full 39-dimensional feature set that were used to construct the PTG table, respectively. Error bars indicate standard deviations in five independent cross-validation trials (they are invisible due to the intrinsically small scales).

Comparison of frequencies of chemical elements in 𝐷ÙÚQ. top: black bar graph) and 𝐷ÛÜ=QÝ=EÝ (bottom: . black bar graph).

Estimation of the intrinsic dimension of element data

We then estimated the local intrinsic dimension for each element data point (i.e., each element) using the OTPM (optimal feature map topology preservation) method [111]. In OTPM, the estimation of the intrinsic dimension is based on the local PCA of the indicators of the nodes in OTPM and their immediate neighbors. Distributions of the dimension estimator through the standard periodic table, the PTG square table shown in Fig.

Distribution of the dimension estimator 𝑚Þ\ through (a) the standard periodic table, (b) the square PTG table shown in Fig.

Notes on the PTG Algorithm

The lower left and lower right figures show the same conical PTG table viewed from the top and side respectively.

Details of analysis procedure

Therefore, this table was chosen as the final visualization result of the PTG square table and corresponds to the one shown in Figure 3.16, the 4th conical periodic table was chosen as the final visualization result, which gives the lowest MAE (0.464 eV/atom). of the PTG taper table and corresponds to the one shown in the figure. Mean absolute errors (MAE) of formation energy predictions per atom for the 10 square periodic systems used as element descriptors.

Mean absolute errors (MAE) of formation energy prediction per atom for the 10 conical periodic tables used as element descriptors.

Other examples

The vertical axis indicates the cross-validated MAE of random forest (RF) regressors trained with the 10 descriptors obtained from the coordinates of elements in the conical periodic tables produced by PTG, with the same hyperparameters and different trials. Examples of the PTG tables with three different layouts: rectangular grids (top), three-dimensional cylinder type (middle), and cubic (bottom). The elements are color coded according to the five periods and numbered with atomic numbers.

For the cylinder and the cubic tables, the left and right figures show the stop and side views, respectively.

Concluding remarks

The prediction of energetically stable crystal structures formed by a given chemical composition is a central problem in solid state physics. Here we present a unique methodology for crystal structure prediction (CSP) that relies on a machine learning algorithm called metric learning. It is shown that a binary classifier trained on a large number of already identified crystal structures can determine the isomorphism of crystal structures formed by two given chemical compounds with an accuracy of approximately 96.4%.

For a given query compound with an unknown crystal structure, the model is used to automatically select from a crystal structure database a set of template crystals with nearly identical stable structures to which element replacement should be applied.

Introduction

In principle, the crystalline state of the collected atoms can be determined by optimizing the energy surface, which in turn can be estimated using first-principles calculations. However, iterative descent of the gradient on the potential energy surface using first-principles calculations is prohibitively expensive for complex systems, such as those with many atoms per unit cell. The potential of this substation-based CSP has been demonstrated for a wide variety of crystal systems.

Method

Outline
Learning to predict structural identity from compositional features
Overall prediction scheme of the CSP method
Chemical composition descriptor
Preparation of structural similarity labels
Experimental procedure

Specifically, element similarity is defined as the Euclidean distance of the 19 elemental descriptors (Table 4.1). Then, a crystal structure descriptor was calculated by taking the summary statistics of the site fingerprint across all atomic sites in the crystal structure. The third column indicates the number of similar pairs in all pairs of the 33,153 stable compounds for each τ.

To eliminate the imbalance between the number of positive and negative cases in classifier training, 421,000 negative cases were randomly selected from the different sets.

Result

The fourth column lists the ranking of the minimum disparity of the top five candidates in all candidates. We examined the dissimilarity value of the closest structure in 𝒞à to the known stable structure (the third column), which was compared to the minimum dissimilarity value of the best template among all candidates in 𝒞 (the second column). Therefore, we estimate that approximately the entire crystal system can be predicted using the proposed method.

Crystal structure diagrams of the true structures and predicted structures (closest to the true structure in the top 5 structures) for each query are shown here (delineated with VESTA [121]).

Analysis procedure for model comparison

Structure prediction results for 35 query constructs (the order in which they appear is the same as in the first column of Table 4.4). The values of the differences between the real structure and the predicted structure are shown, which were calculated using local structure order parameters [37]. Also shown are the formula and material IDs (assigned to compounds in the Materials Project database [ 15 , 26 ]) of the template structure that was used as a template for element-replacement structure prediction.

Because the Material Design DFT calculation environment and the environment described in step 3 (Section 4.2) are not completely identical, the true structures shown in this figure are those that were optimized locally using DFT calculations with the described environment in step 3. .

Detail of the models

The structure, hyperparameters, and training procedure of the NN binary neural network models are identical to those used for model comparison and those used as final models in the CSP method.

Concluding remarks

Furthermore, the crystal structure prediction task was formulated as a metric learning problem. In another study, the goal was to answer whether the crystal structure can be predicted from the chemical composition of the material. If the crystal structure database continues to expand monotonically in the future, the scope of application of the substitution-based method will also expand.

Cenzual Pearson's Crystal Data: Crystal Structure Database for Inorganic Compounds, 1. udg., ASM International: Materials Park, Ohio.