Effects of Multiplied integration value - Semi-supervised Learning for Low Signal-to-Noise Rati

Figure 13: Latent space mapping by UMAP when the network is trained only using only the Positive spectra without multiplying the integration value with the addition of batch normalization layers with different latent dimensions at (a) Latent dimension 3, (b) Latent dimension 4, (c) Latent dimension 5

Figure 14: Clustering accuracy depending on different multiplied Positive spectra ratio with different latent dimensions

p(yi|µ_i,Σi) = 1

2π^n/2|Σ|^1/2exp

(yi−µi)^T|Σ|⁻¹(yi−µi) 2

, (19)

log(p(yi|µ_i,Σi)) =log 1

2π^n/2|Σ|^1/2−(yi−µi)^T|Σ|⁻¹(yi−µi)

2 . (20)

Finally, the log-likelihood is changed as the form below,

−log(p(y_i|µ_i,Σi)) ∝||y_i−µi||²

2 = ||µ_i−f_θ(xi)||²

2 . (21)

It can therefore be concluded that, when using MSE loss for neural network training, the neural network is trained based on the mean value of its dataset.

In Figure 14, the accuracy is compared with different ratios of the multiplied positive in the training

dataset. In the NMR spectra dataset, there are three different mean values,µND,µmul−PosandµPos. Since µmul−Pos is formed by multiplying 5.28 to µPos, µmul−Pos=5.28×µPos. If the ratio of the multiplied Positive spectra is 0.5, the autoencoder is trained on

0.5×(µPos+µmul−Pos) =0.5×(µPos+5.28×µmul−Pos) =3.14×µPos. (22) Then, the dimension of the multiplied Positive spectra is reduced between 5.28×µPosand 3.14×µPos

and the dimension of the Positive spectra are reduced between 3.14×µPos andµPos. GMM makes two multivariate normal distributions based onN (µr−ND|Σr−ND)andN (µr−mul−Pos|Σr−mul−Pos). In the test dataset, there are two multivariate normal distributionsN(µND|Σ_ND)andN (µPos|Σ_Pos). If the smaller value than 5.28 is multiplied to Positive spectra, two normal distribution from GMM is not capable to distinguish ND and Positive spectra. In Figure 15, if the bigger value than 5.28 is multiplied, the distance difference betweenµND−µPosandµPos−µmul−Posbecomes too big and the autoencoder considers ND and Positive as one distribution. The distance betweenµPos−µmul−Pos becomes about 25 times bigger than µND−µPos, so the autoencoder considers ND and Positive spectra as one distribution. It is not compatible with the purpose of differentiating ND and Positive spectra.

In Figures 16, 17 and 18, each figure is averaged in a latent plot with different ratios of multiplied Positive spectra in the training dataset. As shown in Figure 14, if the ratio is not 0.5, the ND, Positive, and multiplied Positive spectra are not aligned sequentially, and the accuracy is not as high as the ratio of 0.5.

Figure 15: Distance of mean vectors µmul−Pos,µPosandµND in case of multiplying different values in different latent dimension (a) Latent dimension 2, (b) Latent dimension 3, (c) Latent dimension 4, (d) Latent dimension 5

Figure 16: The averaged latent space mapping when the ratio of multiplied spectra is zero and different latent dimension (a) Latent dimension 3 (b) Latent dimension 4 (c) Latent dimension 5

Figure 17: The averaged latent space mapping when the ratio of multiplied spectra is 0.5 and different latent dimension (a) Latent dimension 3 (b) Latent dimension 4 (c) Latent dimension 5

Figure 18: The averaged latent space mapping when the ratio of multiplied spectra is zero and different latent dimension (a) Latent dimension 3 (b) Latent dimension 4 (c) Latent dimension 5

VI Conclusion

In this study, we devise L-SNRNet to distinguish low SNR NMR spectra into impurity-included signal and just noisy signal. We supposed that even the determination made by NMR experts are not accurate, so the autoencoder was used for dimensionality reduction and clustering methods were chosen for semi- supervised learning. We also developed L-SNRNet using three steps of regularization methods. We have normalized the low SNR spectra in multiple aspects. Since each spectrum has different intensity, all spectra have been normalized into maximum value of each spectrum. Then, BN layers have been added to increase the representational capacity within latent space. Finally, to enhance the expressivity of each spectrum, the mean integration values are multiplied. The effect of the multiplied integration value is that it makes the ND and Positive spectra linearly farther apart, so GMM can make two normal distributions more in distinct way. L-SNRNet can distinguish low SNR NMR spectra well into two categories. L-SNRNet can be applied to analyze weak and low SNR spectra of any measurements.

References

[1] F.-F. Li, A. Karpathy, and J. Johnson, “Cs231n: Convolutional neural networks for visual recognition,”University lecture, 2015.

[2] D. Bank, N. Koenigstein, and R. Giryes, “Autoencoders,”arXiv preprint arXiv:2003.05991, 2020.

[3] R. Khosravanian and B. S. Aadnoy, Methods for Petroleum Well Optimization: Automation and Data Solutions. Gulf Professional Publishing, 2021.

[4] V. Mlynárik, “Introduction to nuclear magnetic resonance,”Analytical Biochemistry, vol. 529, pp.

4–9, 2017.

[5] H. Günther,NMR spectroscopy: basic principles, concepts and applications in chemistry. John Wiley & Sons, 2013.

[6] L. McInnes, J. Healy, and J. Melville, “Umap: Uniform manifold approximation and projection for dimension reduction,”arXiv preprint arXiv:1802.03426, 2018.

[7] S. Mahajan and I. P. Singh, “Determining and reporting purity of organic molecules: why qnmr,”

Magnetic Resonance in Chemistry, vol. 51, no. 2, pp. 76–81, 2013.

[8] G. F. Pauli, S.-N. Chen, C. Simmler, D. C. Lankin, T. Godecke, B. U. Jaki, J. B. Friesen, J. B.

McAlpine, and J. G. Napolitano, “Importance of purity evaluation and the potential of quantitative 1h nmr as a purity assay: miniperspective,” Journal of medicinal chemistry, vol. 57, no. 22, pp.

9220–9231, 2014.

[9] U. Holzgrabe, R. Deubner, C. Schollmayer, and B. Waibel, “Quantitative nmr spectroscopy—applications in drug analysis,” Journal of pharmaceutical and biomedical analysis, vol. 38, no. 5, pp. 806–812, 2005.

[10] D. Geißler, N. Nirmalananthan-Budau, L. Scholtz, I. Tavernaro, and U. Resch-Genger, “Analyzing the surface of functional nanomaterials—how to quantify the total and derivatizable number of functional groups and ligands,”Microchimica Acta, vol. 188, no. 10, pp. 1–28, 2021.

[11] P. Giraudeau, “Challenges and perspectives in quantitative nmr,” Magnetic Resonance in Chem- istry, vol. 55, no. 1, pp. 61–69, 2017.

[12] T. Qiu, Z. Wang, H. Liu, D. Guo, and X. Qu, “Review and prospect: Nmr spectroscopy denoising and reconstruction with low-rank hankel matrices and tensors,”Magnetic Resonance in Chemistry, vol. 59, no. 3, pp. 324–345, 2021.

[13] T. Qiu, W. Liao, Y. Huang, J. Wu, D. Guo, D. Liu, X. Wang, J.-F. Cai, B. Hu, and X. Qu, “An automatic denoising method for nmr spectroscopy based on low-rank hankel model,”IEEE Trans- actions on Instrumentation and Measurement, vol. 70, pp. 1–12, 2021.

[14] M. Kaur, C. M. Lewis, A. Chronister, G. S. Phun, and L. J. Mueller, “Non-uniform sampling in nmr spectroscopy and the preservation of spectral knowledge in the time and frequency domains,”

The Journal of Physical Chemistry A, vol. 124, no. 26, pp. 5474–5486, 2020.

[15] J. Nalepa, S. Adamski, K. Kotowski, S. Chelstowska, M. Machnikowska-Sokolowska, O. Bozek, A. Wisz, and E. Jurkiewicz, “Segmenting pediatric optic pathway gliomas from mri using deep learning,”Computers in Biology and Medicine, p. 105237, 2022.

[16] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,”nature, vol. 521, no. 7553, pp. 436–444, 2015.

[17] I. Goodfellow, Y. Bengio, and A. Courville,Deep learning. MIT press, 2016.

[18] J. Schmidhuber, “Deep learning in neural networks: An overview,” Neural Networks, vol. 61, pp. 85–117, 2015. [Online]. Available: https://www.sciencedirect.com/science/article/pii/

S0893608014002135

[19] A. Voulodimos, N. Doulamis, A. Doulamis, and E. Protopapadakis, “Deep learning for computer vision: A brief review,”Computational intelligence and neuroscience, vol. 2018, 2018.

[20] D. W. Otter, J. R. Medina, and J. K. Kalita, “A survey of the usages of deep learning for natural lan- guage processing,”IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 2, pp. 604–624, 2020.

[21] G. B. Goh, N. O. Hodas, and A. Vishnu, “Deep learning for computational chemistry,”Journal of computational chemistry, vol. 38, no. 16, pp. 1291–1307, 2017.

[22] A. C. Mater and M. L. Coote, “Deep learning in chemistry,”Journal of chemical information and modeling, vol. 59, no. 6, pp. 2545–2559, 2019.

[23] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.-A. Manzagol, and L. Bottou, “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion.”

Journal of machine learning research, vol. 11, no. 12, 2010.

[24] P. Baldi, “Autoencoders, unsupervised learning, and deep architectures,” inProceedings of ICML workshop on unsupervised and transfer learning. JMLR Workshop and Conference Proceedings, 2012, pp. 37–49.

[25] J. MacQueenet al., “Some methods for classification and analysis of multivariate observations,”

inProceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1, no. 14. Oakland, CA, USA, 1967, pp. 281–297.

[26] C. Williams and C. Rasmussen, “Gaussian processes for regression,”Advances in neural information processing systems, vol. 8, 1995.

[27] T. K. Moon, “The expectation-maximization algorithm,” IEEE Signal processing magazine, vol. 13, no. 6, pp. 47–60, 1996.

[28] D. Müllner, “Modern hierarchical, agglomerative clustering algorithms,” arXiv preprint arXiv:1109.2378, 2011.

[29] F. Murtagh and P. Contreras, “Algorithms for hierarchical clustering: an overview,”Wiley Interdis- ciplinary Reviews: Data Mining and Knowledge Discovery, vol. 2, no. 1, pp. 86–97, 2012.

[30] D. Chen, Z. Wang, D. Guo, V. Orekhov, and X. Qu, “Review and prospect: deep learning in nuclear magnetic resonance spectroscopy,” Chemistry–A European Journal, vol. 26, no. 46, pp.

10 391–10 401, 2020.

[31] D.-W. Li, A. L. Hansen, C. Yuan, L. Bruschweiler-Li, and R. Brüschweiler, “Deep picker is a deep neural network for accurate deconvolution of complex two-dimensional nmr spectra,”Nature communications, vol. 12, no. 1, pp. 1–13, 2021.

[32] G. Karunanithy and D. F. Hansen, “Fid-net: A versatile deep neural network architecture for nmr spectral reconstruction and virtual decoupling,”Journal of biomolecular NMR, vol. 75, no. 4, pp.

179–191, 2021.

[33] A. v. d. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, “Wavenet: A generative model for raw audio,” arXiv preprint arXiv:1609.03499, 2016.

[34] G. Karunanithy, H. W. Mackenzie, and D. F. Hansen, “Virtual homonuclear decoupling in direct detection nuclear magnetic resonance experiments using deep neural networks,” Journal of the American Chemical Society, vol. 143, no. 41, pp. 16 935–16 942, 2021.

[35] W. Yang, X. Zhang, Y. Tian, W. Wang, J.-H. Xue, and Q. Liao, “Deep learning for single image super-resolution: A brief review,” IEEE Transactions on Multimedia, vol. 21, no. 12, pp. 3106–

3121, 2019.

[36] Z. Wang, J. Chen, and S. C. Hoi, “Deep learning for image super-resolution: A survey,” IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 10, pp. 3365–3387, 2020.

[37] X. Qu, Y. Huang, H. Lu, T. Qiu, D. Guo, T. Agback, V. Orekhov, and Z. Chen, “Accelerated nuclear magnetic resonance spectroscopy with deep learning,” Angewandte Chemie, vol. 132, no. 26, pp.

10 383–10 386, 2020.

[38] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700–4708.

[39] H. H. Lee and H. Kim, “Intact metabolite spectrum mining by deep learning in proton magnetic resonance spectroscopy of the brain,”Magnetic Resonance in Medicine, vol. 82, no. 1, pp. 33–48, 2019.

[40] K. Wu, J. Luo, Q. Zeng, X. Dong, J. Chen, C. Zhan, Z. Chen, and Y. Lin, “Improvement in signal-to-noise ratio of liquid-state nmr spectroscopy via a deep neural network dn-unet,”Analyti- cal Chemistry, vol. 93, no. 3, pp. 1377–1382, 2020.

[41] G. Orlando, D. Raimondi, and W. F Vranken, “Auto-encoding nmr chemical shifts from their native vector space to a residue-level biophysical index,”Nature communications, vol. 10, no. 1, pp. 1–9, 2019.

[42] Y. Kwon, D. Lee, Y.-S. Choi, M. Kang, and S. Kang, “Neural message passing for nmr chemical shift prediction,” Journal of chemical information and modeling, vol. 60, no. 4, pp. 2024–2030, 2020.

[43] Y. Shen and A. Bax, “Protein backbone and sidechain torsion angles predicted from nmr chemical shifts using artificial neural networks,”Journal of biomolecular NMR, vol. 56, no. 3, pp. 227–241, 2013.

[44] H. Han and S. Choi, “Transfer learning from simulation to experimental data: Nmr chemical shift predictions,”The Journal of Physical Chemistry Letters, vol. 12, no. 14, pp. 3662–3668, 2021.

[45] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds.

Curran Associates, Inc., 2019, pp. 8024–8035. [Online]. Available: http://papers.neurips.cc/paper/

9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

[46] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Pretten- hofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,”Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.

[47] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1026–1034.

[48] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.

[49] S. Kern, S. Liehr, L. Wander, M. Bornemann-Pfeiffer, S. Müller, M. Maiwald, and S. Kowarik,

“Artificial neural networks for quantitative online nmr spectroscopy,”Analytical and bioanalytical chemistry, vol. 412, no. 18, pp. 4447–4459, 2020.

[50] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning. PMLR, 2015, pp.

448–456.

[51] R. Gala, N. Gouwens, Z. Yao, A. Budzillo, O. Penn, B. Tasic, G. Murphy, H. Zeng, and U. Sümbül,

“A coupled autoencoder approach for multi-modal analysis of cell types,” Advances in Neural Information Processing Systems, vol. 32, 2019.

[52] J. Wang, S. Li, B. Han, Z. An, Y. Xin, W. Qian, and Q. Wu, “Construction of a batch-normalized autoencoder network and its application in mechanical intelligent fault diagnosis,”Measurement Science and Technology, vol. 30, no. 1, p. 015106, 2018.

[53] N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. T. P. Tang, “On large-batch training for deep learning: Generalization gap and sharp minima,”arXiv preprint arXiv:1609.04836, 2016.

[54] J. Olivier, S. Kilani, and R. Poirier, “Determination in low-energy electron loss spectroscopy of the gaussian and lorentzian content of experimental line shapes,”Applications of Surface Science, vol. 8, no. 3, pp. 353–358, 1981.

Acknowledgements

I would like to begin by expressing my genuine appreciation to my advisor, Prof. Hongsik Jeong, who gave me the precious opportunity to complete this master’s course in UNIST. I would also like to show my deepest gratitude to Dr. Dong-hyeok Lim. His essential insight in many fields and his thoughtful advice have been indispensable in helping me finish my master’s study.

I am sincerely thankful to my colleagues. With their assistance, I have learned countless things.

They have always encouraged me to keep up my research pace. I hope they achieve all of their goals in the future.

Dalam dokumen Semi-supervised Learning for Low Signal-to-Noise Ratio NMR Spectra (Halaman 33-46)