Multiomics Analyses for Phenotype Prediction

Currently, the advent of omics studies has allowed researchers to investigate and identify biomarkers as a measurable indicator to predict disease risk. Therefore, this study presents the principles and methods of multiomics analyzes to overcome the current limitations of single omics marker-based disease risk prediction by compiling multiple omics results from two separate studies: stressomics and cardiomics. In the stressomics study, combined data from the methylome and the transcriptome were used to develop machine learning models to predict the risk of depression and suicide.

The second study, cardiomics, used genomic data to predict the risk of acute myocardial infarction among young people. No variant was observed at the level of "genome-wide significance" after performing a genome-wide association study. However, the polygenic risk score determined from the cumulative effect of whole-genome-wide variants could distinguish patients with early-onset acute myocardial infarction and predict their subsequent cardiovascular events.

Finally, this study highlights the principles, methods, applications, and guidelines for multiomics analyzes to help accelerate the utilization of omics data for future studies. DSM-IV, Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition GWAS, genome-wide association study.

Introduction

Several previous omics studies have identified markers for psychiatric disorders and ischemic heart disease.15-20 Although omics data generation and biomarker discovery are expanding,20-22 omics data alone do not guarantee discovery of highly accurate markers. correctly and the biomarkers discovered are also expanding. it does not guarantee sufficient reproducibility and sensitivity.23-27 Furthermore, marker-based risk prediction cannot fully explain the pathogenesis in patients who do not present those markers associated with the disease. Therefore, such a skill can be used by applying machine learning or deriving a polygenic risk score based on Whole Genome Sequencing (PRS). Machine learning is the study of computer algorithms that automatically improve through experience.28 Applying machine learning combined with multiomic data in a systematic way results in improved prediction accuracy.29-31 Furthermore, machine learning can help in the discovery of biomarkers; depending on the algorithm used, a significant pattern or marker in a data set can be identified by a machine learning model.

Then, PRS is a quantitative factor determined from the cumulative effect of genome-wide variants.26, 32 Although monogenic variants can be highly accurate for disease prediction, such variants are rare; the vast majority of common and complex diseases are devoid of monogenic variants.32 Therefore, the PRS was developed to include polygenic variants for disease prediction.33-35. Here, I report on two studies that used multiple markers to predict risk for psychiatric and physical illness. In a stressomics study, machine learning-based models combining methylome and transcriptome markers were constructed to predict depression and suicide risk.

In the Cardiomix study*, whole genome-based PRSs were derived to predict cardiovascular disease risk. Based on these two studies, I present the guidelines for the methodological considerations to be made when combining several markers and the practical use of these combined markers.

Chapter 1. Stressomics

Introduction

Method

Sample preparation
Methylome and transcriptome data generation
Methylation and transcriptome marker discovery
Feature selection and model construction

The quality of these cDNA libraries was assessed with the Agilent 2100 BioAnalyzer 11. Then, the samples were quantified with the KAPA Library Quantification Kit 12. For methylation marker detection, Methyl-seq and RNA-seq sequencing reads were 18. To identify transcriptome markers, filtered RNA-seq reads were mapped to human hg19 29.

There were three binary classification models (SA versus MDD, MDD versus control, and SA versus control).

Results and Discussion

The baseline sample characteristics
Marker discovery
Classifier and regression model construction using machine-learning

To build classification of tags and regression models on the psychiatric scale, methylation markers 6. methylation β-value difference >1% and Benjamini–Hochberg adjusted P < 0.05) and transcriptome 7. After feature selection 810 and 467 methylation markers (48 and 51 12. transcriptomic markers) remained for the HAM17 and SSI regression models, respectively. ROC curves for classification of MDD and controls using measured and estimated HAM17 (a) and 8.

The protocadherin (PCDH) gene family was in the enriched biological terms in the feature sets 5. The psychiatric scales, such as HAM17 and SSI, were also successfully predicted by our regression 9.

Figure 1 The Performance of the stressomics Models 2

Chapter 2. Cardiomics

Introduction

Method

Sample preparation
Genome data generation
Genomic marker discovery
PRS ascertainment and application

3.7).77 The common variant was then genotyped using HaplotypeCaller in 2. GATK with the “-stand_call_conf 30” option. Finally, variants in the callable genomic region were filtered and then rigorously extracted based on the "accessible" region defined by 1000 Genomes 4. To identify genomic markers, all pairs of individuals with an identity by value ancestry of > 0.125 8. corresponding to third-degree relatives) were extracted and grouped into a family group until there were no pairs 9. After that, each family group was reduced as follows:. i) The sample with the largest number of couples in the family group was eliminated. ii) The sample with the highest missing calls among the 12 linkage-disequilibrium pruned SNVs.

PRS was calculated for patients and controls based on the reported list of allele variants and 25.

Results and Discussion

The baseline sample characteristics
Marker discovery
PRS ascertainment to distinguish patients and predict subsequent cardiovascular events

Furthermore, the contribution of the PRS to the six conventional risk factors was significant (also P). PRS indicates the polygenic risk score; AUC, Area under the curve; conventional risk factors, combined 1. The AUC of current smoking status was higher than that of the other predictors.

9 as the number of cardiovascular risk factors increases, the severity of asymptomatic coronary 11 also increases.

Figure 4 Results from cardiomics Genome-wide Association Study 9

Conclusions

Additionally, an early genomic risk assessment becomes more useful as it can provide a 3.