Penelusuran database
bioinformatika
Spektrum dan hirarki dari
informatika kesehatan
Bioinformatics Molecular Cellular Imaging Informatics Tissues, Organs Clinical Informatics Individual Patients Public Health Informatics PopulationsAdapted from Shortliffe
Diadaptasi dari Shortliffe
Apa itu bioinformatika?
• Disiplin ilmu yang meliputi bagaimana mendapatkan, memproses, menyimpan,
mendistribusikan dan menganalisis informasi biologi dan kedokteran
• Dalam arti luas:
– Setiap penelitian yang berhubungan dengan
proses biologi dengan mempergunakan komputer
• Dalam arti sempit:
– Analisis berbasis komputer terhadap data sekuens dari struktur makromolekul
S C I T S I T A T S S C I M O N E G S C I M O T P I R C S N A R T S C I M O L O B A T E M S C I T A M E H T A M G N I L E D O M R A L U C E L O M S C I M O E T O R P Y G O L O T N O H C E T O F N I N O I T U L O V E H C E T O I B S C I M O L L E C S C I M O I S Y H P Y G O L O I B
Bioinformatika menggabungkan ilmu Biologi, Kedokteran, Kimia, Matematika, Statistik dan
ilmu Komputer untuk memahami proses biologis dari kehidupan
Bioinformatika menjembatani banyak disiplin ilmu
DNA sequence
Gene & Genome Molecular evolution
Protein Structure, Folding, Function & Interaction
Metabolic Pathways Regulation
Signaling
Networks
Physiology & Cell Biology Interspecies
Interaction
Ecology & Environment
Genome sequencing Genomic data analysis
Statistical genetics
Proteomics Protein structure prediction, dynamics, folding & design
Functional genomics (microarrays, 2D-PAGE, etc)
Data standards, representations, & analytical tools for Complex biological data
Dynamical Systems modeling High-tech Field ecology Computational ecology Experimental
Hardware & instrumentation
Information technology
Computation
Mathematical & physical models
Se quenc e P h ysiol o gy (and be yo nd)
Methodology & expertise
Penelitian biologi abad 21
“ The new paradigm, now emerging is that all the 'genes' will be known (in the sense of being resident in databases available electronically), and that the starting point of a biological investigation will be theoretical.”
The Pyramid of Life
30,000 Genes 10,000 Proteins 1400 Chemicals Metabolomics Proteomics Genomics B I O I N F O R M A T I C S Wishart (2004)Entrez: Neighbors and Hard Links
Genomes Taxonomy PubMed abstracts Nucleotide sequences Protein sequences 3-D Structure 3 -D Structure Word weight VAST BLAST BLAST Phylogeny Source NCBIData sekuens
Pencarian Obat Baru Diagnosis Pengembangan Vaksin Hypothesis-Driven Researchpenelitian tradisional
• Satu gen setiap eksperimen • butuh waktu panjang
• Melelahkan • hasil terbatas
• In vitro, in vivo, ex vivo
NGS, Microarray
• fast tracking
• ribuan gen setiap kali eksperimen
• fungsi dari gen, baik sendiri atau interaksi dengan yang lainnya
Contoh basisdata
• Nucleotide Database (GenBank)
– BLAST (Basic Local Alignment Search Tool)
• Protein Sequence Database
• Protein Structure Database (PDB) • Genome Database
• Microarray Database
• Metabolic Pathway and Protein Function Database
Contoh tipe data
Nucleotide/protein sequence
Gene expression level
GenBank
• Basis data sekuens
• Koleksi anotasi sekuens DNA
• 171.744.486 sekuens (April 2014)
• Data sekuens didapatkan dari submisi langsung dari para ilmuwan/author
• Basis data genbank didesain untuk
menyediakan informasi sekuens yang
Sumber data GenBank
• Submisi langsung dari individu peneliti melalui form (BankIt, Sequin)
• Submisi melalui Batch email (EST, GSS, STS) • Melalui akun FTP (File Transfer Protocol)
• Data dari tiga kolaborasi basis data:
– GenBank
– DNA Database of Japan (DDBJ).
– European Molecular Biology Laboratory Database (EMBL)
Basis data primer vs. sekunder
• Primary Databases
– Original submissions by experimentalists
– Database staff organize but don’t add additional
information
• Example: GenBank
• Derivative Databases (Secondary)
– Human curated
• compilation and correction of data
• Example: SWISS-PROT, NCBI RefSeq mRNA
– Computationally Derived
• Example: UniGene
Format file
• Genbank Flatfile (GBFF) – Header – Features – Sequence • FASTA format– Deskripsi dimulai dengan tanda> – Diikuti dengan data sekuens
Contoh analisis
Kaohsiung J Med Sci. 2008 Feb;24(2):55-62. doi: 10.1016/S1607-551X(08)70098-6.
Phylogenetic study of dengue-3 virus in Taiwan with sequence analysis of the core gene.
Tung YC1, Lin KH, Chang K, Ke LY, Ke GM, Lu PL, Lin CY, Chen YH,
Chiang HC.
URL:
http://www.sciencedirect.com/science/article/pii/S160 7551X08700986
• Analisis kemiripan (BLAST) • Desain primer
• Komparasi sekuen • Multiple alignment • Phylogenetic analysis
• high-density oligonucleotide human genome array GeneChips U133 Plus 2.0 (Affymetrix)
• This chip comprises more than 54.000 probe sets and analyzes the expression level of over 47.000 transcripts and variants including 38.500 well-characterized human genes
Microarray assay life cycle
Biological question Sample preparation Microarray hybridization Microarray detection Data analysisProses data Microarray
Prediction: Gene Value D26528_at 193 D26561_cds1_at -70 D26561_cds2_at 144 D26561_cds3_at 33 D26579_at 318 D26598_at 1764 D26599_at 1537 D26600_at 1204 D28114_at 707 Class Sno D26528 D63874 D63880 … ALL 2 193 4157 556 ALL 3 129 11557 476 ALL 4 44 12125 498 ALL 5 218 8484 1211 AML 51 109 3537 131 AML 52 106 4578 94 AML 53 211 2431 209 … Data Mining and analysis New sampleMicroarray chips Images scanned by laser
Datasets
Contoh skema data analisis microarray Microarray
Normalization
Statistical test; T-test/ANOVA (Analysis of Variance) Filtering steps
PTM
(Pavlidis template matching)
Cluster Analysis
Genes of interest validation
Biological Process/Function/Pathway
Gene expression profiles Raw data P repr oc essi ng H igh le vel anal ysi s
Contoh analisis
Biogerontology. 2009 Apr;10(2):191-202. doi: 10.1007/s10522-008-9167-1.
Epub 2008 Aug 27.
Microarray analysis reveals similarity between CD8+CD28- T cells from young and elderly
persons, but not of CD8+CD28+ T cells.
Lazuardi L1, Herndler-Brandstetter D, Brunner S, Laschober GT, Lepperdinger
G, Grubeck-Loebenstein B.
URL:
http://link.springer.com/article/10.1007%2Fs10 522-008-9167-1
contoh hierarchical cluster analysis Genes clusters (1-21) 2 1 2 0 1 9 1 2 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 3 Y1 _2 8P Y2 _2 8P 01 _2 8P 02 _2 8P Y1 _2 8N Y2 _2 8N 01 _2 8N 02 _2 8N 0 7 10 A Y1 _2 8 P Y2 _2 8 P 0 1 _2 8 P 0 2 _ 2 8 P Y1 _2 8 N Y2 _2 8 N 0 1 _2 8 N 0 2 _2 8 N B Linkage distance
Y1_28P & Y2_28P : CD8+CD28+ T cells from young persons
O1_28P & O2_28P : CD8+CD28+ T cells from elderly persons
Y1_28N & Y2_28N : CD8+CD28– T cells from young persons
O1_28N & O2_28N : CD8+CD28– T cells from elderly persons
cluster 13 Exp re ss io n le ve l Cluster gen