METAGENOMICS: CONG CU HUTJ HIEU TRONG KHAI THAC N G U 6 N GEN
Tnrcmg Nam Hai*, 0 6 Thj Huyen
Wen Cdng nghi sinh hgc - Wen Han lam Khoa hoc vh Cong ngh^ Vi^t Nam ' Author for corresspondence: Tel: +84-4-37562790; Email: [email protected]
Mo-dau
Trong sinh hpc, dac biet la cong nghe sinh hoc hon 10 nam gan day da co nh&ng birfrc Uen w o t bac ve ky thuat nghien dru sau Ithi "Bin thao" ciia bo gen nguoi duffc cong bo nam 2000.
Mot k^ nguy&n moi da xuat hien trong nghi&n ciru sinh hpc, ky nguyen hau genom khi cac qui trtnh sinh hoc xiy ra trong te bio, ca the va khu h$ vi sinh vat co the dyp'c nghien ciiu mpt cich tong the. Neu tnr6c day chi co the nghien cihi don le cac gen, thi hien nay ngu'di ta c6 the nghien ciru cung mpt luc ci mpt hoac da h$ gen, h$ protein, hay mpt to hp'p cic chSt thir cap tir bat cur doi tupTig sinh hpc hay mpt he sinh hpc nao do trong mpt khoing thoi gian rat ngan so voi truoc day. Mpt loat cac cong cu va llnh virc nghien ciiu [-omics) moi trong sinh hpc ra doi nhu genomics, metagenomics, transcriptomics, proteomics, metabolomics. Vdi cic cong cu nay chiing ta c6 the nghiSn ciru bat cir mpt he gen hay qui trinh sinh hpc nio trong tS bio va CO thi de CO the lam sing to co chl va tinh chat ciia no iy muc dp phan tir. Dua tren hieu biet can ke cic qui trinh sinh hpc vi hS gen ciia cac co the khic nhau, chiing ta c6 the tai tao lai cic qua trinh sinh hpc trong phong thi nghiem mfit each chii dpng, chinh xac va hieu qui nhir thien nhien da lam.
Metagenomics la mpt phirong phap nghiin Clin cho phip nhan dang vi sinh vat va phan lap true tilp gen tir DNA tong so ciia mpt khu he vi sinh vat mi khong can nuoi cay de phin lip tirng vi sinh vat trong do. Phuong phip nay da dup'c sii dung rong rai va thanh d n g tren the gioi hon 20 nam qua, nhieu gen moi va nhilu loai vi sinh vat moi dirpc phit hien. Vi du, nam 2004, Venter va dong su (Venter et al., 2004) da xic dinh dup'c 148 loai vi khuin moi trong so 2000 loai vi sinh vat va 69.901 gen moi tir quin xa vi sinh vat nirac bien Sargasso. Hien nay, nhcr cic he thong giii trinh tu gen hieu nang cao hien dai nhat da dupc phat triln ma metagenomics cang ngiy cang dupc ling dung rong rai hoTl. He thong nay co the giai trinh tu toin bp DNA ciia da he gen (metagenome) cua khu he vi sinh vat song trong bit ky mau moi truong nao trong mpt thcri gian ngan tao dilu kiln thuan loi trong vile nghiin ciiu tim cic gen moi vi su da dang vi sinh vat (UEB, 2011). Nhcr co miy xic dinh trinh tu gen the hi moi voi khi nang giai ma mpt bp gen trong thcri gian circ ngan kit hp'p v6i ling dung cic cing cu cua bioinformatics ma metagenomics da tro thinh mpt phuong phip nghiin cii'u rat hiru hilu cho vile nhan dang va phan lap true tilp gen tir vl sinh vat ma khong can nuoi cay. Chinh vi vay, hien nay cic nghiin cim vl ung dung metagenomics trong nghien ciru da he gen cixa khu hi vi sinh dua trin viec giai ma toan bp he gen bang miy giii trinh tu the he moi khong ngirng tang theo thcri gian. Theo thong kl ciia ProQuest Central, so luong cac c8ng trinh khoa hpc dupc dang tii liin quan din cong nghe Metagenomics lien tuc tang manh, die bilt trong khoing 5 nim gan day, khi he thong giai trinh tu gen the hi moi ra doi [Hinh 1).
Trong thong bao nay chiing toi trinh bay mpt each tong quit kit qua ban dau ve viec iing dung metagenomics trong nghien ciru vi khai thic cic gen ma hoa protease, amylase va celullase [lignocellulase va hemicellulase) tir khu h | vi sinh rupt moi. Day la loai sinh vat chuyen hoa
_ I - HOI NGHI CONG NGHE SINH HOC TOAN QUOC 2013
lignocellulose rit hilu qua trong tir nhien nho co khu he vi sinh da dang trong duong tilu hoa.
Cf moi bic thip, lignocellulose duoc thiiy phan bii cac enzyme tiet ra tir bin thin rupt moi va cac enzyme dup'c sinh ra tir vi khuan, vi khuan co va ci protozoa co mat trong rupt moi (Tokuda and Watanabe, 2007). Vilt Nam co hon 120 loai moi, trong do mli Coptotermes la pho biln va dl thu mau nhat
1400
1000
600
200
776
' 0 0 ^ ^
1 1
i<,8 I 1
^ ^ ^ -l^^l 1 1 1
1005
ft^^ ft^'' # Ci^^ C?^^ C$i^ # CS^ C??* CS^"^ CN^^ c^'^'^ e.^'^ CN*-'^
Hinh 1. So lirgng b&l bAo ve hiroYig im^ dyng Metagenomics Sage dang tai tCr nam 2000 den cuoi th^ng 7 n^m 2013 CTheo ProQuest Central).
(^^^Zi^Z^—*• <^^^^^^!^<^;;^^
Hinh 2. Qui trlnh phSn t(ch dO* ll^u metagenomic DNA bang cdng cv tin sinh hgc.
Vffi muc dich khai th^c c^c gen ma h6a enz3nne tham gia thiiy phan Ugnocellulose tu- vi khuan trong ruot m6i Coptotermes, DNA metagenome cua vi khuan trong ruot moi da diro-c tach chiet theo phirong phap dac biet de lam giam toi da sxx nhiem tap cua DNA tir ru6t moi va tu* vi sinh vat nh3n thSt nhu- nam. Sau do DNA metagenome dircrc giai trinh tu bang he thong Illumina's HiSeq. Dir lieu tho sau khi thu dircrc tir he thong giai trinh tu se ducrc xu 1;^ bang cac phSn mem de loai bo cac trinh tir tuong dong vm m6i va cac trinh tu chua "N" la cac trinh tu chat lugng kem de c6 ducrc du li^u tinh. Sau do, bang phan mem SOAPdenovo, cac trinh tu nay ducrc sap xep dua tren su tuang dfing va noi ghep thanh cac doan trinh tu li^n tuc (Contig). Sau do, cac
BAl VltT KHACH Mai - HQI NGH! C O N G N G H ^ SINH H ^ ^
contig ducrc dung lam gen mau de danh gia lai do tuong dong tren kho dir lieu tinh nham sira doi contig co tinh toan thong ke de dam b^o dp tin cay cho cac contig dung de ph3n tich gen.
Tiep theo, bing phan mem MegaGenAnnotator, cac gen chiJc nang ducrc sang loc de tao kho giir lieu ve gen. Cac gen nay ducrc so sanh viri kho giu lieu tren ngan hang gen the giM bang Blastall NR de dua ra su da dang ve vi sinh vat trong mlu gi^i trinh tu, dSng thM bang phan mem Blastall tren kho dir li§u KEGG v^ eggNOG, cac gen ma hoa protein chuc nang da ducrc nhan dien nham phuc vu cho viec khai thac gen. Cac buuc phan tich dir lieu DNA metagenome duffc thi hien tren Hinh 2.
Danh gi^ da dang vi sinh vat trong rudt mdi
Toan bp du ii6u tho thu dup-c sau gi^i trlnh tu co kich thuo-c 5.618,21 M bp. B6 diy lieu tho nay sau d6 dup-c xu ly loai bo cac read co chat lup-ng khong tot de thu dup-c bp du lieu sach vo-i chat lupng tot hon, co kich thtr&c 5.431,60 Mbp, bao phu 96,68% dir lieu tho. Bang phep so sanh gi6ng cpt cac trinh tu dup-c lap ghep thanh 79.262 contig voi tong kich thudc la 90.150.894 bp. Sau do bang phan mem Soapalignier, cac trlnh tu trong thu vien dup-c dem so sanh voi cac contig cua chinh no de tim ra cac trinh tu dpc chat lupng cao, mang toan bp gen hay chi mang mot vung gen chuc nang. Sau khi phan tich nhu tren da lap r^p dugc 18.709.714 trinh tu dUng tu tong so 54.316.028 trinh tu dpc dugc ban dau.
Cac trlnh tu nay dugc so sanh viri nhirng trinh tu genome vi khuan trong ngan hang gen NCBI, RDP, trinh tu genome co san. Ket qua cho thay, trong he vi sinh rupt moi vi khu^n chiem 34,07%, nam chigm so lugng thap 0,032%.
Archaea
Bacteria
Eukaryota
Viruses ring s6
Phylum
4
22 20
46 Class
10
*1
14
65 Orde
15 97 25 1 138
Famih 21 217 33 11 282
Genus 47 528
*3 13 731
Species S1 1368 24 7 1460
Archaea Unclassified 0 42%
Viruses i 8 79% l ^ ^ M 1
Eukaryota ^ ^ h ^ ^ ^ ^ ^ ^ ^ ^ B 0 58% ^ ^ ^ B ^ ^ ^ H
^w
Hlnh 3. Oa d^ng cfia vi sinh v|t trong ru^t moi Coptotermes
Kit qua phan tich sau hon cho thay, co tSt ca 1460 loai vi sinh vat cfing sinh trong rufit mli thuoc 46 ngJnh da duoc xac dinh. Trong do nganh vi khuSn chilm uu thi (80%) vol 1368 loai, c6n lai mit so It thuoc cac ng^nh khac nhir: co khuan (0,42%), Eukaryota (0,58%), virut (0,2%) (Hinh 3).
U-ffc doan cac ORFs tham gia vao cic con dircrag chuyin hoa
Voi cac contig thu duoc, ph^n mim MetaGeneAnnotator da tim duoc 125,431 ORF (Open reading frame) vol so luong cac ORF hoan chinh la 37.545 (Mch thuoc trung binh m6i doan ORF la 624,02 bp). So sanh cac khung doc mo voi co so dir lieu NR (file chira kit qua phSn
) '•<-'•' f f i - H O l NGH! CONG NGHESINH HOCTOAN QUOC 2013
loai) CO 77,22% ORF dugc phan loai, 4,67% ORF chua dugc phan loai va 18,12% ORF chua x^c dinh du.gc
Dua tren dii- lieu KEGG (Kyoto Encyclopedia of Genes and Genomes), eggNOG database (evolutionary genealogy of genes: Non-supervised Orthologous Groups), trong so 125.431 ORFs CO 36.477 ORF dugc du doan ma hoa cho enzyme va 65.536 ORF ma hoa protein co chuc nang. Trong so do, cac ORFs ma hoa cho protein tham vao con duo-ng van chuydn qua mang chiem t^ 16 cao nhat rBi den nhom protein, enzyme tham gia vao con duong chuyen hoa carbohydrate, protein enzyme tham gia vao phien ma va sua chira phien ma va con duong trao d6i amino acid (Hinh 4). Dieu nay cho thay, vi sinh vat trong rupt m6i tham gia rat dac luc vko su chuydn hoa sinh khoi.
KEGG p a t h w a y c l a s s i f i c a t i o n
Environmental Ada
Cbculatoiy System Xeirablatlcs B to degradation and M c ^ t n l l s m Nuclaolida Metabolism MetalKillsmalTemcaKildsandPohilcetldes M etaboBam o I Other Amino A d d s MetBl»l>smo(CofBctoiBBndVKainlns Lipid Metabolism Glycan Blosjnthesis and M etabollam E n i ^ n e F a m t t s s Energy Matatnlism Caitio hjdrafe M etabo llem B losynthsG IS o f Other SecQ ndary M etabo ntes Amino Acid M etabollsm Nsurodegenerathfa Diseases Metabolic Diseases
Caidiot lar DtsBBses Translation Transcfiptlon Replication and Repi^r Falcfng, Sorting and Dagradafkin gnallngY^oleculBs andlnTeractkin
BOOO tlOOO Number of matched genes
Hinh 4 . U&c dodn chu-c nang ciia cac chuoi polypeptide ma hoa t i r ORFs tr@n dO- M^u KEGG
Khai thdc cac gen ma hoa cho enz}nnne tham gia thuy phan lignocellulose
Trong tong s6 125.431 ORF co 5.058 ORF dugc uo-c doan la ma hoa cho enzyme tham chuyen hoa carbonhydrate. Cac trinh tu nay co 2672 trinh tu xac dinh dugc nguon goc, trong do co 2515 trinh tu dugc xac dinh la tu vi khuan (chiem 94%]. Trong so nay, vi khuan thugc nganh Firmicutes co so lugng dong nhat (651 ORF) va phan lan la Lactobacillales (528 ORF). Nhiing phan tich sau hon ve khai thac cac trinh tu ma hoa cho enzyme thugc nhom cellulase, hemiceUulase ciing chi ra rang cac gen ma hoa cho enzyme nay cung dugc uuc doan co nguon goc tu Bacteroidetes, Paenibacillus, Lactococcus garvieae, Lactococcus lactis. Streptococcus, Enterobacter cloacae, Klebsiella oxytoca.
Trong sd cac trinh tu tham gia vao chu trtnh chuyen h6a carbohydrate, chi khoang 6,88%
trinh tu (575 trinh tu) ma hoa cho cellulase, hemicellulase v^ 441 (5,18%) doan ORF chua xac dinh dugc chu-c nang ro rang. Nh6m cellulase dugc ma hoa boi cac ORF gom co 9 loai: 6-
phospho-beta-glucosidase, beta-glucosidase, cellobiose phosphorylase, cellulose 1,4-beta- cellobiosidase, endoglucanase, glucan 1,3-beta-glucosidase, ^ucan 1,6-alpha-glucosidase, glucan endo-l,3-beta-D-glucosidase, licheninase va hemicellulase gom 7 loai: alpha- galactosidase, alpha-glucuronidase, alpha-N-arabinoftiranosidase, arabinan endo-l,5-alpha-L- arabinosidase, endo-l,4-beta-xylanase, mannan endo-l,4-beta-mannosidase, xyian 1,4-beta- JO^losidase (Hinh 5). Trong do, so ORF ma hoa cho en^rnie beta-glucosidase chiem t^ 16 I6n (36,13%), tiep theo do la alpha-galactosidase (13,18%) va alpha-N-arabinofuranosidase (11,82%). Trong so 575 trinh tu ma hoa cellulase, hemicellulase, chl co 98 trinh tu (chi^m 17%) la trinh tutoan ven, con lai la c^c trinh tu thieu dau 3', 5' hoac ca hai. Nhom cellulase chi CO 4 nhom enzjone beta-glucosidase, glucan 1,3-betaglucosidase, emdoglucanase va 6- phospho-beta-glucosidase co trinh tu- hoan thien. Trong do 13 trinh tu beta-glucosidase, 2 trinh tu gmucan 1,3-glucosidase, 1 trinh tu endoglucanase, 5 trinh tu 6-phospo-beta- glucosidase c6 do tuong dong tu 86-100%.
Tuong tu nhu vay, trong 6 nhom enzyme thugc hg hemicellulase c6 trinh tu hoan thign, 4 trinh tu xylan 1,4 beta xylosidase, 1 trinh tu alpha-glucuronidase, 4 trinh tu alpha-N- arabinofuranosidase, 1 trinh tu endo-1,4 beta xylanase tuong dong 86-100% so v&i trinh tu ci5a c5c enzyme tuong ung tren ngan hang. Sau trinh tu xylan 1,4 beta xylosidase, 5 trinh tu alpha-galactosidase, 6 trinh tu alpha-N-arabinofuranosidase tuong d&ng 60-85% so v6i trinh tu enzjone tuong ung tren ngan hang NCBI.
200 150 -
50
Lignocellulolytic enzymes
2
r
60&-phospho-beta- glucosldase 1
H • 48
•11 i_i ^ ^ 1 i 1 ^ • ^ 1
cellobiose phosphorylase cellulose 1,4-beta- cellobiosldase endoglucanase glucan 1,3-beta- glucosidase glucan endo-1.3-beta-D- glucosldase licheninase alpha-galactosidase alpha-glucuronidase alpha-N- arabinofuranosidase arabinan endo-1.5-alpha- L-arabinosldase endo-1.4-beta-xylanase mannan endo-1,4-beta- mannosidase xylan 1,4-beta-xylosidase
Hirh 5. a . ,rt„h ,v „ | 1,6, „llula», h,„i„,|„|,„ kl,,; „ J , ,j,,J l,j„ „ „ , m«,geno»,t .r sinh .jt .™„g ,uSt „ Khai tliac cac gen ma hoa cho enzyme thuoc ho protease
I - HOI NGH! C O N G N G H $ SINH HOC T O A N Q U O C 2013
Tii- 125.431 ORF cua du lieu DNA metagenome cua vi sinh vat c6ng sinh trong ruot moi, hon 1000 trinh tiJ* ma h6a enz5nne thugc ho protease da dugc khai thac. Cac trinh tu nay dugc phan nhieu nh6m enzyme nhung co khoang 40 nhom co muc d6 tuong dbng cao 90% (Hinh 6). Cung tuong tu nhu cac nhom trinh tu ma h6a cho enzyme thuy phan lignocellulose, khoang 20% trinh tu hoan thien va phan Ion la cac gen khong hoan thien. Ciing tuong tir nhu cac trinh tu ma hoa enzyme thiiy phan lignocellulose, cac trinh tu nay c6 nguon goc ch6 yeu tii- Lactobacillus.
S 6 T R ) N H T I / COA CAC N H 6 M THEO ANNOTATION.KEGG
I l l l l l i i i i i i i
Hlnh 6. Cdc nhdm enzyme tham gia phan giii protein dirpv ma hda bifi cic trinh t v trong dO- ll$u glil trlnh ^ DNA metagenome cua vi sinh v | t trong ru$t mfii.
Khai thac cac gen ma hoa cho enzyme thuoc hp amylase
So lu'O'ng gen trong cac ho
B M i * - S - - £ - J
3 3 3 2 2 2z*'^'
^ ^ * J^ s->* •,«? „»* «*• * • * d * »> „ * .„*''
.->'
Hlnh 7. Kh,i t h i c cdc trlnh tv dirgrc dif do j n mS hoa cho enzyme tham gia thOy Phan tinh b9t.
Tir DNA metagenome cia ruot moi, 153 trinh tif dirac irwc doan ma hoa cho enzyme tham gia thuy phan tinh b6t da dirac khai thac. Trong do so lircrng enzyme thu5c ho amylase (AmyAc- family) chilm ty IS cao nhat, hoTi 50% (77 trinh tir) nhir trinh bay tren Hinh 7. Trong so cac
BAI Viet KHACH MOI - HQI NGH| CONG NGH6 bINH HQC
trinh tlT nay, 20% trinh ttr la cac trinh tir hoan thien, ma hoa cho enzyme nguySn ven. Ket qua tren Hinh 8 cho thay 153 Gen nay dirge tim thay trong khoang 40 loai vi sinh vat trong do c h i yiu la Lactococcus (28 trinh tu).
I
Cac loai sinh vat mang gen Amylase
161514 9 S 7
' 4 4 3 3 3 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
lUm;
//////W^^W^W^^"^^^^^///^^/-^^^^^
Hlnh 8. d c chi vi khuSn difgc cho la sinh ra enzyme tham gia thQy phan tinh bpt.
KETLUAN
Metagenomics 1^ mgt cong cu rat huu hieu trong vi§c nghien cuu va khai thac cac nguon gen tir c5c khu he vi sinh vat dac hfin. Thong qua viec tach chiet DNA da he gen tu khu he vi sinh ruot moi, giai trinh tu to^n bo DNA va xfl- ly cac trinh tu nhan ducrc bang cac c6ng cu cfla tin sinh hpc, chung toi da thu nhan dug-c lugng thong tin vo cung phong phii ve c5c gen ma hoa cho cac nhom enzyme quan trgng nhu amylase, protease va cellulase cfla vi sinh vat rugt moi Coptotermes cua Viet Nam. Ket qu5 nay dat ay sir cho cac nghien cfln tiep theo nhu t6ng hgp hoac phan lap cac gen quan tam tu co- so- du lieu nay de nghien cfln cac ti'nh chat cua chiing.
Dieu nay mu ra trien vgng ph^t hien cac enqnne co cac tinh chat dac bi$t va tim each ung dung chflng m6t c^ch hieu qua nhat trong tuong lai.
TAI LIEU THAM KHAO
Amann, R.I., Ludwig. W., and Schlelfer, K.H. (1995). Phyiogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol. Rev. 59,143-169.
Applications, N.R.C. (US) C on M.C. and F. (2007). Committee on metagenomics: chaUenges and functional applications.
Hultmaa J., and Auvinen, P. (2010). [Metagenomics opens up new frontlet in microbiology]. Duodcdm 126, 1278—1285.
Huseneder, C, Wise, B.R., and Hlgasiilguchl, D.T. (2005). Microbial dlvenilQ, in the termite gut a complementan, approach combimng culture and culture-independent techniques. Proceedings of the Fiith Intemadonal Conference on Urban Pests.
^ l " l o g ; 7 ^ 9 t ' : 9 8 : ^ ^'°°*^- " ' ' ^ ' " ' • " " " - '^^ ^" ' " * ^ ""^"""^ " " " ° ' ' - ^ " - " ' ° P ' " - •"
UEB (2011). Introduction to next generation sequencing.
Nelson, W., et al. (2004). Environmental genome shotgun sequencing of the Sargasso Sea. Science 304. 66-74.