Adding batch normalization layers and multiplying mean integration value

To discriminate Positive spectra more accurately, batch normalization [50] (BN) layers that normalize distribution of each batch to follow normal distribution was added to L-SNRNet. BN is calculated as

h(x) =ˆ γ·h(x)−µ(h(x))

σ(h(x)) +β, (18)

whereh(x) =w^Tx andβ andγ are learnable parameters of BN. The results are shown in Figure 12a

∼c. Obviously, the two datasets, ND and Positive, are distributed farther from each other than they were before the addition of BN. In training, if the integration value is multiplied to all Positive spectra, the resulting test accuracy is still not good. Therefore, both just normalized Positive spectra and the Inte-Positive should be used for training. The autoencoder can be trained on three features, normalized ND, Positive and Inte-Positive spectra, so it can reduce the dimensions of the original spectra into the latent space distinguishable to a ’peak’ shape. In Figure 12a∼c , the autoencoder reduces the three groups of input spectra into the latent space in a distinguishable manner. Finally, in Table 4, L-SNRNet shows accurate and stable results on Python’s numerical random state.

Since the autoencoder does not use data labels for back propagation, each data has a low representation capacity. If the amount of impurity is too small and the shape of its spectra is noisy, then when the dimensions of the weak NMR spectra are reduced into the latent space, some spectra will be mixed with the spectra of different labels and end up being indistinguishable. This motivates us to combine prior NMR knowledge with a neural network to make the data recognizable. In NMR spectra, the intensity of the peak signal is proportional to the number of different hydrogen atoms. The ratio of different pro- tons by given signal is calculated by integrating the area under its corresponding peak. Relative mean integration values of ND and positive NMR spectra are multiplied to each spectrum. After integrating all ND spectra, the mean integration value of ND set is considered as 1. At that time, mean integration value of Positive set is measured as approximately 5.28.

Multiplying the integration value can lead to a large distance between ND and Positive spectrum, 18

Figure 12: Clustering result maps of latent vectors by UMAP with the addition of batch normalization layers with different latent dimensions at (a) Latent dimension 3, (b) Latent dimension 4, (c) Latent dimension 5 and multiplying integration value at (d) Latent dimension 3, (d) Latent dimension 4, (f) Latent dimension 5

Latent Dimensions 2 3 4 5

k-means clustering train 82.38±16.09 89.39±11.62 92.57±2.99 86.03±15.87 k-means clustering test 84.0±11.62 89.29±11.87 92.36±4.99 85.67±15.81 Gaussian mixture train 98.74±2.14 99.49±1.21 99.76±0.23 99.49±0.44

Gaussian mixture test 96.54±2.75 97.63±1.42 99.09±0.91 98.73±1.42 Agglomerative clustering 88.64±4.02 91.03±3.73 91.87±4.09 90.56±5.41

Table 4: Accuracy using input spectra multiplied by relative mean integration value

which can make the two sets of data more distinguishable. The clustering results are shown in Fig 12d

∼ f, and the training and test accuracy are listed in Table 3. For fair inference, we do not multiply the mean integration values to the test spectra. This shows exact results on training accuracy, but not in test accuracy. In Figure 12d ∼f, the latent vectors of integration-value-multiplied Positive spectra used in training is distributed separately. However, the latent vectors of just normalized Positive spectra is overlapped with ND spectra. We can conclude that the autoencoder still does not differentiate the features between ND and Positive.

In semi-supervised identification of NMR spectra, the key is to make the two datasets the ND and Positive datasets differentiable in latent space, and BN can preserve the difference of the spectra well.

Without BN, the latent vectors shrink and the latent representation is indistinguishable [51]. BN can scale the latent representation, so the latent representation becomes wider and has enough representation capacity. Finally, through BN, we can suppose that latent vectors follow a normal distribution.

Further, Wang et al. [52] added BN layers to the autoencoder and it can resolve the covariance shift and showed better semi-supervised learning results than autoencoder without BN. When batch normalization is applied to autoencoder, if batch size is small, normalized batch cannot reflect whole data distribution and the results are not good and unstable. To improve accuracy and get consistent results, it is better to use a large batch size than a small one. We empirically found that when the batch size is 1/3 of the total input data, each batch can have the distribution of the total dataset and the autoencoder was trained well. Keskar et al. [53] argue that large-batch training leads to a sharp minima.

In training the autoencoder, it can be thought that, by large batch, the network converges to similar minimum point, so it shows a similar latent space representations.

It is well known that most of the spectra follow the Voigt profile [54], which has its shape in the combination of Lorentzian and Gaussian distribution components. The spectra are composed of a Gaus- sian distribution and BN makes latent vectors follow a normal distribution in (3), so we can guess that the latent vectors also that Gaussian properties and GMM fits the best with the latent vectors.

The three steps described above are regularizing low SNR data into ordered sets at different levels.

First of all, the normalizing spectrum is in each data level. Subsequently, multiplying mean integration values involves regularizing between data-labels. At the aspect of the autoencoder, batch normalization layers are added in the batch level while training the autoencoder. Finally, we change 11.4 % of NMR experts’ decisions on the impurity state by low SNR NMR spectra. Finally, all spectra are well-organized through the three steps, and GMM which implements the Expectation Maximization (EM) algorithm [27], is well suited to our purpose of categorizing low SNR spectra into two groups: ND and Positive.

Figure 13: Latent space mapping by UMAP when the network is trained only using only the Positive spectra without multiplying the integration value with the addition of batch normalization layers with different latent dimensions at (a) Latent dimension 3, (b) Latent dimension 4, (c) Latent dimension 5

Dalam dokumen Semi-supervised Learning for Low Signal-to-Noise Ratio NMR Spectra (Halaman 30-33)