Semi-supervised Learning for Low Signal-to-Noise Ratio NMR Spectra

However, it has weak signal responses ie. low signal-to-noise ratio (SNR), due to the intrinsic low sensitivity and resolution of the NMR spectrometer. Quantification of spectral regions is also difficult due to the interplay of noise and signal. Finally, the mean integration values of the NMR spectra are multiplied by the normalized spectra to replace the intensity information because the quantification of the NMR spectra is estimated based on the integrated peak areas.

However, since the NMR spectrometer often suffers from low sensitivity and resolution [11], when a low-field NMR spectrometer is used, the final NMR spectrum has a low signal-to-noise ratio (SNR) meaning that the intensity of the desired signal is relatively weak compared to the noise signal. To measure the relative number of differently bonded atoms, the corresponding peaks are integrated. Moreover, there is no consistent pattern of noise combined with different intensities of the desired signal.

Quantitative analysis by NMR spectra is based on manual peaks and integration value of the signal, which may lead to human error in the analysis results [15].

Figure 1: Application of L-SNRNet to the production of Ti-precursor for semiconductor manufacturing

Neural Network

The convex objective function converges to the global minimum point starting from the initial parameters. If the optimizer is applied to a non-convex objective function, the convergence of the network parameters is not guaranteed. It is important to carefully choose the cost function for the purpose of training the neural network.

However, the surface of the cost function is usually very flat, which weakens the effectiveness of guiding neural network training. The log function in the negative log-likelihood cost function (1) gives the output to the exponential function of the sigmoid in equation (3) or the softmax function in equation (4). The effect of the log-likelihood function can protect the gradient-based learning from saturation.

An autoencoder, as shown in Figure 3, is a type of neural network for learning the representation in a semi-supervised manner [24]. It compresses and encodes the input data in latent dimensions, Rm, and reconstructs the input data in the original space, Rn. The autoencoder compresses the original dimensions,Rn, of the input data into the latent dimensions,Z :Rm, using an encoder,gθ :Rn→Rmand decoder, fφ :Rm→Rn, regenerates the encoded vectors in the reconstructed output.

Representation learning of the autoencoder starts from the Manifold Hypothesis, which means that high-dimensional data space mostly lies on low-dimensional manifold space. The encoder, the first half of the autoencoder, reduces the original dimension of the input data, x, to the latent vector, z, and then tries to recreate the latent vector to the input in its original dimensions, ˆx. To regulate the autoencoder, mean-square-error (MSE) loss, as shown in (2), is used, with L(x,x)ˆ as the reconstruction error of the autoencoder.

Clustering

From the unknown i.i.d distribution p(x), we need to find an approximation using a GMM with K mixture components, where the GMM has its parameters, the mean µk, the covariance matrix Σ and the mixture weightsπk. 7) Then we obtain the log-likelihood as. From the optimized parameter θ∗, we can assign the label for each data, which is a latent one-hot vectorz, wherezk∈ {0,1}. The latent vector indicates that the data points are the components of the Gaussian distribution, i.e.

Up to this point, we have derived the general form of the GMM and its latent variable on a single data pointx. The meaning of p(zk=1|xn) is the probability that the mixture component generates the data pointxn. Agglomerative clustering [28, 29] is a type of hierarchical clustering that groups data points based on their similarity.

The algorithm merges data points into one large tree-shaped cluster based on distance, which is called a dendrogram. To make the final cluster decision, the dendrogram is split based on the predefined number of clusters.

Nuclear Magnetic Resonance Spectroscopy

In contrast, when the nucleus has an even number of protons or neutrons, it has zero spin and a magnetic moment. If the direction of the spin magnetization is parallel to the magnetic field in Figure 2(A), the vector product, equation (13), is zero, meaning it is static. However, if the direction is anti-parallel, the vector product is non-zero, and precession of the spin magnetization occurs around the direction of BO in Figure 2b.

To measure the precession frequency, a strong magnetic field is initially applied to the molecule. The sampled points involved in the precession are converted to frequency using the Fourier transform. The intensity of precession varies depending on the proximity of electronegative atoms (O, N and halogens) and unsaturated atoms (C=C, C=O).

Otherwise, if the electron density of the bound atom is weak, a strong precession occurs. Depending on the order of chemical bonds, the corresponding NMR signal is different and it is called chemical shift. Therefore, based on NMR chemical shift, we can guess the chemical bond order of the analyte molecules in Figure 7.

Figure 5: Magnetization of spin under strong magnetic field B 0 . (A) The spin state is parallel with the magnetic field

Deep Learning for NMR research

Framework and training L-SNRNet

Dataset, pre-processing and data augmentation

To solve the problem involving the lack of spectrum data, the following parameters were taken from a previous study [49] and used to synthesize the spectrum: Gaussian noise and peak scaling (Table 1). Randomly sampled values from normalized distributions with different standard deviations were added to the NMR spectra. First, the spectra were normalized to place all data in the same intensity range, 0 ~ 1.

Finally, to increase the expressiveness of the data under a semi-supervised framework, the relative mean integration values of ND and positive data were multiplied by each data set.

NMR spectra with normalized intensity and multiplying relative mean integra-

The spectra in ND case have noisy shape in Figure 10c, and the two normalized spectra data sets show different distribution in original dimension (∈R400) shown in Figure 10d. After training the neural network, the dimensions of the spectra were reduced to latent dimensions so that its impurity state could be interpreted using clustering methods. 95% accuracy and the results are unstable under the initial random number conditions for Python (Table 2).

The latent space plots in Figure 11a~c show that the two spectra are not well separated. ND data points (black dots) and positive data points (red dots) are not discretely distributed: the distributions of some data points overlap.

Table 2: Accuracy with normalized spectra and different clustering methods

Adding batch normalization layers and multiplying mean integration value

For a fair conclusion, we do not multiply the average integration values in the test spectra. In Figure 12d ∼f , the latent vectors of positive spectra multiplied by the integration value used in training are distributed separately. We can conclude that the autoencoder still does not distinguish features between ND and Positive.

In semi-supervised identification of NMR spectra, the key is to make the two data sets the ND and Positive data sets differentiable in latent space, and BN can preserve the difference of the spectra well. BN can scale the latent representation, so the latent representation becomes wider and has enough representation capacity. 52] added BN layers to the autoencoder and it can solve the covariance shift and showed better semi-supervised learning results than autoencoder without BN.

When group normalization is applied to the autoencoder, if the group size is small, the normalized group cannot reflect the entire data distribution, and the results are not good and unstable. We empirically found that when the set size is 1/3 of the total input data, each set can have the distribution of the total data and the autoencoder is well trained. It is well known that most spectra follow the Voigt profile [54], which has its shape in the combination of Lorentzian and Gaussian distribution components.

The spectra are composed of a Gaussian distribution and BN makes latent vectors follow a normal distribution in (3), so we can guess that the latent vectors also that Gaussian properties and GMM fit the latent vectors best. On the autoencoder aspect, batch normalization layers are added at the batch level while training the autoencoder. Finally, all spectra are well organized by the three steps, and GMM implementing the expectation maximization (EM) algorithm [27] is well suited for our purpose of categorizing low SNR spectra into two groups: ND and Positive.

Figure 12: Clustering result maps of latent vectors by UMAP with the addition of batch normalization layers with different latent dimensions at (a) Latent dimension 3, (b) Latent dimension 4, (c) Latent dimension 5 and multiplying integration value at (d)

Effects of Multiplied integration value

In Figure 14, the accuracy is compared with different conditions of the multiplied positive in the training. Napolitano, "The importance of purity evaluation and the potential of quantitative 1h nmr as a purity analysis: mini perspective," Journal of medicinal chemistry, vol. Waibel, "Quantitative NMR spectroscopy - applications in drug analysis," Journal of pharmaceutical and biomedical analysis, vol.

Kalita, “A survey of the application of deep learning to natural language processing,” IEEE Transactions on Neural Networks and Learning Systems, vol. Qu, “Review and outlook: deep learning in nuclear magnetic resonance spectroscopy,” Chemistry–A European Journal , vol. Hansen, “Fid-net: A versatile deep neural network architecture for NMR spectral reconstruction and virtual separation,” Journal of biomolecular NMR, vol.

Hansen, “Virtual homonuclear decoupling in direct detection nuclear magnetic resonance experiments using deep neural networks,” Journal of the American Chemical Society , vol. Liao, “Deep learning for single-image super-resolution: A brief review,” IEEE Transactions on Multimedia , vol. Hoi, “Deep learning for image super-resolution: a survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.

Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, p. Kim, “Intact metabolite spectrum mining by deep learning in brain proton magnetic resonance spectroscopy,” Magnetic Resonance in Medicine , vol. Kang, “Neural message passing for nmr chemical shift prediction,” Journal of Chemical Information and Modeling , vol.

Bax, “Protein backbone and sidechain torsion angles predicted from NMR chemical shifts using artificial neural networks,” Journal of biomolecular NMR, vol. Chintala, “Pytorch: a powerful, deep, imperative-style learning library,” in Advances in Neural Information Processing Systems 32, H.