• Tidak ada hasil yang ditemukan

Metagenomic analysis reveals a dynamic microbiome with diversified adaptive functions to utilize high lignocellulosic forages in the cattle rumen

N/A
N/A
Protected

Academic year: 2024

Membagikan "Metagenomic analysis reveals a dynamic microbiome with diversified adaptive functions to utilize high lignocellulosic forages in the cattle rumen"

Copied!
15
0
0

Teks penuh

(1)

Metagenomic analysis reveals a dynamic microbiome with diversified

adaptive functions to utilize high lignocellulosic forages in the cattle rumen

Article  in  The ISME Journal · March 2021

DOI: 10.1038/s41396-020-00837-2

CITATION

1

READS

139 6 authors, including:

Some of the authors of this publication are also working on these related projects:

Analysis of Hematological Traits in Polled Yak by Genome-Wide Association Studies Using Individual SNPs and HaplotypesView project

Genomic diversityView project Javad Gharechahi

Baqiyatallah University of Medical Sciences 42PUBLICATIONS   600CITATIONS   

SEE PROFILE

S.M. Farhad Vahidi

Agricultural Biotechnology Research Institute of Iran, North branch, Rasht 30PUBLICATIONS   356CITATIONS   

SEE PROFILE

Han Jianlin

Consultative Group on International Agricultural Research 237PUBLICATIONS   2,790CITATIONS   

SEE PROFILE

All content following this page was uploaded by Han Jianlin on 23 November 2020.

The user has requested enhancement of the downloaded file.

(2)

UNCORRECTED PROOF

1 A R T I C L E

2

Metagenomic analysis reveals a dynamic microbiome with

3

diversi fi ed adaptive functions to utilize high lignocellulosic forages

4

in the cattle rumen

5 Javad Gharechahi1Mohammad Farhad Vahidi2Mohammad Bahram3,4Jian-Lin Han 5,6Xue-Zhi Ding 7

6 Ghasem Hosseini Salekdeh 2,8

7 Received: 23 May 2020 / Revised: 3 November 2020 / Accepted: 11 November 2020 8 © The Author(s), under exclusive licence to International Society for Microbial Ecology 2020

9 Abstract

10 Rumen microbiota play a key role in the digestion and utilization of plant materials by the ruminant species, which have

11 important implications for greenhouse gas emission. Yet, little is known about the key taxa and potential gene functions

12 involved in the digestion process. Here, we performed a genome-centric analysis of rumen microbiota attached to six

13 different lignocellulosic biomasses in rumen-fistulated cattle. Our metagenome sequencing provided novel genomic insights

14 into functional potential of 523 uncultured bacteria and 15 mostly uncultured archaea in the rumen. The assembled genomes

15 belonged mainly to Bacteroidota, Firmicutes, Verrucomicrobiota, and Fibrobacterota and were enriched for genes related to

16 the degradation of lignocellulosic polymers and the fermentation of degraded products into short chain volatile fatty acids.

17 We also found a shift from copiotrophic to oligotrophic taxa during the course of rumen fermentation, potentially important

18 for the digestion of recalcitrant lignocellulosic substrates in the physiochemically complex and varying environment of the

19 rumen. Differential colonization of forages (the incubated lignocellulosic materials) by rumen microbiota suggests that

20 taxonomic and metabolic diversification is an evolutionary adaptation to diverse lignocellulosic substrates constituting a

21 major component of the cattle’s diet. Our data also provide novel insights into the key role of unique microbial diversity and

22 associated gene functions in the degradation of recalcitrant lignocellulosic materials in the rumen.

23

Introduction

24 As the main sources of energy for their growth and lacta-

25 tion, ruminants depend largely on the end products of the

forage digestion, i.e., energy accessible short chain fatty 26

acids (SCFAs) [1]. These animals lack enzymes that are 27

required to break down the complex lignocellulosic carbo- 28

hydrates as the major component of the forage [2,3]. For 29

this, they rely on microbial communities inhabiting their 30

gastrointestinal tract particularly the rumen. Rumen 31

microbes are thus equipped with diverse carbohydrate 32

active enzymes (CAZymes) for breaking down the complex 33

* Xue-Zhi Ding [email protected]

* Ghasem Hosseini Salekdeh [email protected]

1 Human Genetics Research Center, Baqiyatallah University of Medical Sciences, Tehran, Iran

2 Department of Systems Biology, Agricultural Biotechnology Research Institute of Iran, Agricultural Research, Education, and Extension Organization, Karaj, Iran

3 Department of Ecology, Swedish University of Agricultural Sciences, Ulls väg 16, 756 51 Uppsala, Sweden

4 Institute of Ecology and Earth Sciences, University of Tartu, 40 Lai St, 51005 Tartu, Estonia

5 Livestock Genetics Program, International Livestock Research Institute (ILRI), 00100 Nairobi, Kenya

6 CAAS-ILRI Joint Laboratory on Livestock and Forage Genetic Resources, Institute of Animal Science, Chinese Academy of Agricultural Sciences (CAAS), 100193 Beijing, China

7 Key Laboratory of Yak Breeding Engineering, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences (CAAS), 730050 Lanzhou, China

8 Department of Molecular Sciences, Macquarie University, North Ryde, NSW, Australia

Supplementary informationThe online version of this article (https://

doi.org/10.1038/s41396-020-00837-2) contains supplementary material, which is available to authorized users.

1234567890();,: 1234567890();,:

(3)

UNCORRECTED PROOF

34 structure of plant cell-wall materials. They produce a

35 complex cocktail of enzymes for the efficient degradation of

36 plant polysaccharides, including diverse glycoside hydro-

37 lases (GHs), carbohydrate esterases (CEs), and poly-

38 saccharide lyases (PLs) together with their accessory non-

39 catalytic carbohydrate-binding modules (CBMs), enabling

40 their targeted attachment to lignocellulosic substrates.

41 However, little is known about specific adaptations of

42 microbes to the complex and dynamic rumen environment.

Q1 Q2Q3Q4

43 Rumen microbes utilize different mechanisms to invade

44 complex plant polysaccharide substrates and thus there may

45 be some specialization in substrate types utilized and

46 functional capabilities across different members of the

47 rumen microbial community. Some rumen bacteria includ-

48 ing species of Ruminococcus have evolved multienzyme

49 assemblies, the cellulosomes, in which multiple CAZymes

50 bind via dockerin domains to cell surface localized scaf-

51 folding proteins containing repeated cohesin domains [4].

52 These cohesin–dockerin interactions will enable a concerted

53 action of multiple different CAZymes with varying sub-

54 strate specificities on lignocellulosic substrates and thus

55 enhance the degradation of recalcitrant polysaccharides,

56 e.g., cellulose and hemicellulose, by orders of magnitude

57 [5]. However, Bacteroidetes as abundant members of the

58 rumen microbiome, lack the cellulosome. In these bacteria,

59 all genes required

Q5 for utilizing specific glycan substrate are

60 physically localized on the same portion of their genomes

61 featured by the presence of a susC/D gene pair. These

62 special arrangements of CAZymes are known as poly-

63 saccharide utilization loci (PULs) [6,7]. Since each PUL is

64 dedicated to the break down of specific complex glycan

65 substrate, the combination of enzymes present in PULs

66 determines the carbohydrate degrading capacity of the

67 bacteria [8].

68 In addition, to degrade plant materials efficiently, rumen

69 microbes must be able to colonize and to attach feed par-

70 ticulates. Particle-associated microbes that are embedded in

71 a biofilm matrix play central role in rumen digestion. The

72 attachment of rumen microbes to feed particles promotes

73 their prolonged residence in the rumen and enables their

74 enzymes to be in contact with the substrates [9–11]. Bac-

75 terial attachment to feed particles is initiated within 5 min of

76 feed entry and gets established within 1 h of rumen incu-

77 bation [11, 12]. In this regard, the physicochemical prop-

78 erties of feeds proved to be the major factors determining

79 microbial colonization and digestion in the rumen [12–15].

80 Particularly, rumen microbial communities are distinct

81 between forage-fed and concentrate-fed animals [16, 17].

82 Despite these advances, we still have little information

83 about the diversity of rumen microbiota colonizing forages

84 of different lignocellulosic compositions as well as their

85 dynamic changes along the digestion process in the rumen.

The rumen microbial communities contribute to varia- 86

tions in feed efficiency in the host [18, 19]. Several 16S 87

rRNA gene-based metabarcoding studies report that rumen 88

microbes show significant variations with respect to their 89

ability to attach to feed particles [12,14,15,20]. However, 90

little is known about the genomic constituents of particle- 91

attached microbiota and the variations in their CAZyme 92

repertoires which determine their lignocellulose degrading 93

potential in the rumen. Most particle adhering microbes 94

(>70%) are unable to grow in routine culture conditions 95

[21]. Culture-independent metagenomics has enabled us to 96

gain insight into the functional significance of these 97

microorganisms and their contribution to lignocellulose 98

degradation in the rumen. This, together with recent 99

advances in bioinformatics analyses, has now made it 100

possible to recover near complete and draft genome 101

sequences of uncultured rumen bacteria and archaea from 102

metagenome-assembled contigs [22–26]. A recent study, for 103

example, recovered 4941 metagenome-assembled genomes 104

with >80% completeness from the cattle rumen, many of 105

them (>70%) had <95% nucleotide identity with the exist- 106

ing genomes, suggesting their novel entities [22]. Meta- 107

genome analyses have also revealed that rumen microbiota 108

could be a potential source of novel enzymes for bio- 109

technological applications including biofuel, paper, textile, 110

and food industries [23,27]. Knowledge about the diversity 111

and functional capabilities of rumen microbiota may pro- 112

vide opportunities to improve feed digestion efficiency of 113

domestic ruminants through microbiome engineering for an 114

enhanced lignocellulose degrading capacity. 115

Here, we analyzed 333 base pairs (Gbp) of metagenome 116

sequencing data from rumen microbes attached to six dif- 117

ferent lignocellulosic biomasses following their in situ 118

incubation in the cattle rumen. Our aim was to determine 119

the key taxa and associated gene functions involved in the 120

digestion of different lignocellulosic substrates and their 121

potential shifts during the digestion process. In our previous 122

16S rRNA gene metabarcoding [15], we showed that rumen 123

microbiota had differential preferences for attachment to 124

forages of varying lignocellulose compositions. Here we 125

demonstrate how this differential colonization is reflected in 126

their genomic potential for the degradation of different plant 127

cell-wall polymers. We also hypothesized that changes in 128

community of forage-attached microbiota along the length 129

of rumen incubation reflect their lignocellulose degrading 130

potential. To evaluate this hypothesis, we sequenced and de 131

novo assembled the whole metagenome of fiber-attached 132

rumen microbiota from three rumen-fistulated cattle and 133

reconstructed the genome sequences of rumen microbiota 134

involved in the break down of different lignocellulosic 135

biomasses with varying physicochemical properties. 136

(4)

UNCORRECTED PROOF

137

Material and methods

138 Sample collection

139 The details of three Taleshi bulls, rumen cannulation tech-

140 nique, the physiochemical properties, preparation, and

141 sampling of the six common lignocellulosic forages

142 including camelthorn (AP), common red (CR), date palm

143 (DP), rice straw (RS), Kochia (KS), and Salicornia (SC)

144 have been presented in our previous study [15] and briefly

145 summarized in Supplementary Note and Fig. S1. The

146 experiment was approved by the Ethics Committee for

147 Animal Experiments of the Animal Science Research

148 Institute of Iran.

149 DNA extraction and metagenome sequencing

150 Metagenome DNA extraction from microbial cells tightly

151 attached to the feed particulates was performed as described

152 previously [15, 28]. Metagenome library preparation was

153 performed according to TruSeq DNA Library Preparation

154 Kit (Illumina, San Diego, CA, USA). The quantity of each

155 library was evaluated using a Qubitfluorimeter (Invitrogen,

156 Carlsbad, CA, USA). Metagenome sequencing was per-

157 formed at the Novogene Inc. (Beijing, China) using Illu-

158 mina HiSeq 2500 system with 150 bp paired-end

159 sequencing of ~340 bp fragments.

160 De novo metagenome assembly and binning

161 All metagenome sequences were co-assembled based on

162 all 24 samples across the six forages (co-assembly) using

163 MEGAHIT v1.2.9 [29] with the following options: --k-min

164 31, --k-max 141, --k-step 10, --min-count 2, and --min-

165 contig-len 200. We also assembled reads generated for

166 each forage separately (forage-specific assemblies) using

167 SPAdes v3.13.1 [30] with k-mers 31, 51, 71, 91, 111, and

168 127. To obtain the coverage profiles of assembled contigs,

169 reads were mapped back to the filtered contigs (e.g.,

170 contigs longer than 2000 bp) using Bowtie2 v2.3.5 [31].

171 Samtools v1.9 [32] was used to convert and sort BAM

172 files. Genome bins were reconstructed using MetaBat2

173 v2.14 [33] and MaxBin2 v2.2.6 [34] on the co-assembly

174 and forage-specific assemblies. MetaBat2 was run on the

175 co-assembly and forage-specific assemblies using the

176 option --minContig 2000

Q6 and considering the coverage

177 profiles across either all 24 samples (for the co-assembly)

178 or the four sampling points (for each forage-specific

179 assembly). MaxBin2 was also run on the co-assembly and

180 forage-specific assemblies using the following parameters:

181 --min_contig_length 2000 and --max_iteration 50.

Q7 All

182 genome bins obtained from the co-assembly and forage-

183 specific assemblies were aggregated and dereplicated

separately using DAS_Tool v1.1.1 [35] in order to ensure 184

that the dereplicated genomes did not have any shared 185

contigs. DAS_Tool was run under default setting using 186

diamond as a search engine. Genome bins remained after 187

the dereplication using DAS_Tool from the co-assembly 188

(646 bins) and forage-specific assemblies (1020 bins) were 189

finally aggregated and dereplicated using dRep v2.3.2 190

[36]. In this dereplication step, checkm v1.0.18 [37] was 191

used to evaluate for genome completeness and con- 192

tamination, and only bins with ≥75% completeness and 193

≤10% contamination were retained. The final cluster 194

representatives n=538 (hereafter called Rumen- 195

Uncultured Genomes, RUGs) were further analyzed in 196

this study. 197

Taxonomy and functional analysis of RUGs 198

Taxonomic classification of RUGs was performed using 199

GTDB-Tk v1.3.0 [38]. PhyloPhlAn v0.97 [39] was used to 200

place RUGs into a phylogenetic tree and to further infer 201

their taxonomic label using >400 universally conserved 202

proteins encoded by 3171 genomes. The closest cultured 203

relatives from the Hungate1000 collection [3] were also 204

included. The tree was visualized using iTol v5 [40], and 205

phylum-level taxonomies imputed by GTDB-Tk were 206

assigned to leaves or tips. To identify novel and known 207

genomes, the RUGs were compared to the Hungate1000 208

collection (n=408) and the rumen metagenome-assembled 209

genomes (n=4941) from Stewart et al. [22] using fastANI 210

v1.1 with –fragLen 2000. Coding sequences (CDS) and Q8211

clustered regularly interspaced short palindromic repeats 212

were predicted during Prokka v1.14.0 annotation [41]. 213

tRNAs were predicted using tRNAscan-SE v2.0.6 [42]. 214

Ribosomal RNAs were predicted using barrnap v0.9 215

(https://github.com/tseemann/barrnap). 216

Quantification and differential abundance of RUGs 217

The abundance of RUGs across samples were estimated by 218

mapping reads to binned contigs using quant_bins module 219

from metaWRAP [43]. RUGs were clustered based on their 220

abundance profiles into heatmap using average linkage 221

method and spearman-rank correlation distance matrix in 222

heatmapper [44]. The differentially abundant RUGs 223

between forages and across sampling intervals were iden- 224

tified using DESeq2 [45] with an FDR-corrected p value 225

cutoff < 0.01. 226

Microbial community diversity based on RUGs 227

The microbial community structure of the samples across 228

forages and sampling intervals was explored using Weigh- 229

ted Unifrac distance matrix. Permutational multivariate 230

(5)

UNCORRECTED PROOF

231 analysis of variance (PERMANOVA) was used to test for

232 significant differences among the forages and across the

233 sampling intervals using the Adonis function of vegan

234 v2.5–5 [46]. Group-wise differences were tested using

235 pairwise Adonis. PERMDISP was

Q9 also used to test for the

236 homogeneity of dispersions using the beta-disper function

237 of vegan.

238 Identification and annotation of CAZymes in RUGs

239 Prodigal v2.6.3 [47] was used in metagenome mode to

240 predict protein coding genes. The predicted proteins were

241 scanned for candidate CAZymes using hmmscan module

242 from HMMER v3.2.1 and dbCAN database v6 as input

243 [48]. The hmmscan output was parsed by applying a

244 minimum query coverage 35% and an e-value cutoff 1e−5.

245 PULs were identified using PULpy [49]. The predicted

246 CAZymes were BLASTp [50] searched against the NCBI nr

247 database (November 2019) using an e-value cutoff of 1e

248 −30 and a maximum target sequence of 30. Signal peptides

249 were predicted using SignalP v.4.1 [51] based on both

250 gram-positive and gram-negative models.

Metabolic reconstitution of RUGs 251

To infer the metabolic potential of RUGs, the predicted 252

proteome of each RUGs was scanned for enzymes involved 253

in specific metabolic pathways including glycolysis, pen- 254

tose phosphate (PPP), and Embden–Meyerhof–Parnas 255

(EMP). The search was performed using the hmmscan 256

module of HMMER software against Pfam domains 257

representing key marker genes of each metabolic pathway. 258

Pfam domains were retrieved from Pfam v32 and TIGR- 259

FAMs v15 databases (see Supplementary Data 1 for the 260

details of pfam domains). The ability to metabolize mono- 261

meric carbohydrates including mannose, rhamnose, xylose, 262

galactose, arabinose, fucose, and glucose requires the pre- 263

sence of key kinase, isomerase, and epimerase enzymes 264

catalyzing critical steps in their metabolism as carbon 265

source. 266

The ability of RUGs to ferment carbohydrate monomers 267

into SCFAs was also determined. Butyrate production was 268

approved by the presence of genes for phosphate butyryl- 269

transferase (ptb), butyrate kinase (buk), and butyryl-CoA: 270

acetate CoA-transferase (but). Lactate dehydrogenase (ldh) 271 Fig. 1 The majority of the reconstituted rumen-uncultured gen-

omes (RUGs) are novel and do not have any representatives in the publicly available databases. A The average nucleotide identity (ANI) and aligned fraction (AF) of the 538 RUGs compared with rumen cultured genomes from the Hungate1000 genome project [3], the cattle rumen metagenome-assembled genomes [22], and the GTDB-Tk representative genomes (release 95) [38]. The horizontal dashed line (purple) shows accepted AF threshold (65%), vertical

dashed lines (green and blue) show 95% and 99% ANI thresholds, respectively.BThe phylogenetic tree of the 538 RUGs along with 62 closest relatives from the Hungate1000 collection. The tree was con- structed based on the concatenated multiple sequence alignments of 400 universally conserved proteins using PhyloPhlAn [39] and visualized using iTOL [40]. Clades are colored based on phylum-level taxonomies inferred by GTDB-Tk [38]. The Hugnate1000 repre- sentative genomes are labeled by their taxonomic names.

(6)

UNCORRECTED PROOF

272 was used as the marker of lactate production. Acetate pro-

273 duction was checked by the presence of genes encoding

274 phosphate acetyltransferase (pta) and acetate kinase (ackA).

275 Malic enzyme (men), fumarate hydratase (fum), and fuma-

276 rate reductase (frd) were used as markers of succinate

277 production. Propionate production was also approved by the

278 presence of three key enzymes methylmalonyl-CoA dec-

279 arboxylase (mmdA), methylmalonyl-CoA mutase (mutA),

280 and methylmalonyl-CoA epimerase (mce). The potential of

281 degradation of carbohydrate polymers was further evaluated

282 by enumerating key enzymes for the degradation of cellu-

283 lose, xylan, xyloglucan, chitin, pectin, and starch.

284

Results and discussion

285 Metagenome assembly and genome bin selection

286 In this study, we sequenced the whole metagenome of 24

287 rumen samples, resulted in a total of 333 Gbp of sequencing

data. A combined coverage and tetra nucleotide frequency- 288

based binning resulted in 538 RUGs having >75% com- 289

pleteness and <10% contamination (Supplementary Data 2 290

and Fig. S2A, B). Among the RUGs, 205 had >90% 291

completeness and <5% contamination and 21 could be 292

considered as high-quality draft MAGs based on the criteria 293

defined by Bowers et al. [52] (completeness > 90%, con- 294

tamination < 5%, and the presence of the 23S, 16S, and 5S 295

rRNA genes and at least 18 tRNAs). In addition, 468 (87%) 296

of RUGs also had a quality score (completeness−5 × 297

contamination) > 50%, indicating most of the RUGs to be 298

high quality. Among the 538 RUGs, 351 had at least 18 out 299

of the 20 standard tRNAs (Fig. S2C) and 53 had at least a 300

full-length and 117 a partial 16S rRNA gene. The sizes of 301

RUGs ranged from 606 kilobases (kb) to 5.95 megabases 302

(Mb), with N50 values from 4 to 387 kb. The smallest RUG 303

was a mycoplasma (RUG208), followed by four Sacchar- 304

imonadaceae bacteria with genome sizes <0.8 Mb and an 305

average 78% genome completeness. The number of contigs 306 Fig. 2 The community

composition ofber-attached microbiota changes with incubation time.Heatmap clustering shows a clear separation of samples collected at 24 h from those at later stages of the rumen incubation. The RUGs were hierarchically clustered based on their relative abundances estimated by recruiting reads to contigs in the binned genomes using an average linkage method and spearman-rank correlation distance measurement using Heatmapper [44]. AP camelthorn, CR common reed, DP date palm, KS Kochia, RS rice straw, and SC Salicornia.

(7)

UNCORRECTED PROOF

307 in RUGs ranged from 13 to 703 with an average of 189

308 contigs.

309 By comparing the RUGs with Hungate1000 collection

310 (n=408) [3], the set of RUGs (n=4941) reported by

311 Stewart et al. [22] and the GTDB-Tk representative gen-

312 omes (n=31,910) [38] we found that most of our RUGs

313 were novel and lacked any representative in available

314 databases (Fig.1A). To test whether these low ANI simi-

315 larities are likely resulted from genome incompleteness, we

316 correlated our RUGs completeness and their ANI with

317 Stewart MAGs and found no clear relationship (Fig. S3).

318 Only 92 RUGs (~17%) were reliably assigned to known

319 species with an ANI >95% over ≥65% of alignment frac-

320 tion, congruent with their placement in GTDB-Tk reference

321 tree. Seventy-eight RUGs were remained unannotated at the

322 genus level, six at the family level, and one at the order and

323 the class levels (Supplementary Data 2). A total of 243

324 RUGs (45%) were taxonomically assigned to Firmicutes

325 and 190 (35%) to Bacteroidota (Figs. 1B, S2D). Spir-

326 ochaetota were the third most abundant group of bacteria in

327 26 RUGs (4.8%), followed by Proteobacteria (15), Verru-

328 comicrobiota (12), Synergistota (7), Patescibacteria (6),

329 Elusimicrobiota (5), Cyanobacteria (5), Desulfobacterota

330 (4), and Fibrobacterota (4) (Fig. S2D). The high number of

331 Spirochaetota genomes recovered in this study agreed with

332 an early study on moose rumen, in which Spirochaetota

333 were found to play a major role in pectin degradation [25].

334 All Spirochaetota genomes belonged to the genus Trepo-

335 nema, known as the major hemicellulose degrader in the gut

336 microbiota [53].

337 Overall, 15 RUGs were taxonomically assigned to the

338 archaeal families Methanomethylophilaceae (n=11),

339 Methanobacteriaceae (3), and Methanomicrobiaceae (1).

Among Methanomethylophilaceae, eight RUGs were 340

affiliated to ISO4-G1, a methylotrophic methanogen first 341

isolated from the sheep rumen [54]. Methanobacteriaceae 342

genomes were assigned to the genus Methanosphaera 343

(RUG228) and Methanobrevibacter (RUG389 and 344

RUG508). The single genome recovered from Methano- 345

microbiaceae was assigned to Methanomicrobium mobile, 346

which is one of the major metanogenic archaeon detected in 347

the rumen of sheep (accounting for 54% of the total 348

methanogens) [55] and cattle (>106cells/mL in rumen 349

digesta) [56]. The details of taxonomies inferred by GTDB- 350

Tk are presented in Supplementary Data 2. 351

The differential colonization of forages by RUGs 352

The six lignocellulosic biomasses (AP, CR, DP, KS, RS, 353

and SC) were incubated in three rumen-fistulated bulls for a 354

period of 96 h with 24 h sampling intervals. The samples 355

taken from the three bulls at each sampling interval (e.g., 356

24, 48, 72, and 96 h) were pooled and processed for whole 357

metagenome sequencing of fiber-attached microbiota. 358

Clustering samples at the four sampling intervals and across 359

the six forages based on the relative abundances of the 360

recovered RUGs revealed a clear separation of the samples 361

at 24 h of rumen incubation (Fig. 2). The distribution of 362

RUGs across forages varied as the incubation time was 363

extended. At 48 h, only AP, CR, DP, and SC clustered 364

together. There was no apparent clustering of forages at 72 365

and 96 h, indicating that the digestion during early incu- 366

bation stages converted feed particulates into unfavored 367

substrates for most of the RUGs, a pattern in agreement 368

with our 16S rRNA metabarcoding data [15]. 369 Fig. 3 Phylogenetic diversity ofber-attached RUGs depends on

both feed type and sampling times. APCoA plot shows a signicant separation of RUGs based on the forages and following rumen incu- bation lengths. Forages are indicated by colors and sampling intervals with shapes. Statistically signicant differences in rumen microbial community attached to the forages and between the incubation lengths

were tested by permutation multivariate analysis of variance (PER- ANOVA). The homogeneity of group dispersions between forages (B) and sampling intervals (C) were tested using the beta-disper function in R package vegan. AP camelthorn, CR common reed, DP date palm, KS Kochia, RS rice straw, and SC Salicornia.

(8)

UNCORRECTED PROOF

370 Microbial community diversity and composition ana-

371 lyses also showed significant differences between the

372 forages and sampling intervals (PERMANOVAP< 0.001;

373 Fig. 3), which could be mainly ascribed to community

374 changes in AP compared to CR, RS, and SC (P< 0.05). The

375 distribution of RUGs across sampling intervals also showed

376 a clear separation of 24 h samples (pairwise PERMANOVA

377 P< 0.003) from all other stages which exhibited overlapped

378 clustering (Fig. 3). There were higher relative abundances

379 of Firmicutes (26 RUGs), Fibrobacterota (2 RUGs), Bac-

380 teroidota (2 RUGs), and Spirochaetota (2 RUGs; FDR-

381 corrected P< 0.01) attached to SC (the forage with the

382 highest hemicellulose and the lowest ADF concentrations)

383 compared with other forages. Ruminococcaceae (3 RUGs)

384 were significantly enriched in the microbial community

385 attached to RS (the forage with the lowest levels of ADF

386 and ADL) compared with other forages. Verrucomicrobiota

387 (2 RUGs) were represented by higher abundance in CR (the

388 forage with the highest cellulose content) compared to AP

389 and SC (two forages with the lowest cellulose concentra-

390 tion). Thisfinding indicated the key role of Verrucomicro-

391 biota in cellulose degradation, and was in agreement with a

392 recent analysis of mice microbiota with Bacteroidetes to be

accumulated in the presence of bamboo shoot while Ver- 393

rucomicrobiota to be significantly increased with crystalline 394

cellulose supplementation [57]. 395

Temporal analysis of microbial communities colonizing 396

the forages also revealed the dominances of Bacteroidetes, 397

Firmicutes, and Proteobacteria during initial stage of rumen 398

incubation when maximal DM degradation took place (e.g., 399

24 h), while Verrucomicrobia accumulation started later 400

when most DM got disappeared (Supplementary Data 3). 401

Bacteroidetes, Firmicutes, and Proteobacteria are known as 402

copiotrophic taxa with potentially rapid growth rates and 403

their adaptability to the condition of high carbon availability 404

[58, 59]. On the other hand, oligotrophic bacteria such as 405

Verrucomicrobia are able to grow on low quality and 406

recalcitrant substrates with slow growth rates but a higher 407

substrate utilization efficiency [59]. This is functionally 408

relevant as upon feed entry into the rumen surface acces- 409

sible carbohydrates in feed particulates attract the copio- 410

trophic taxa first, but the digestion during early hours of 411

rumen incubation turns forage residues into favored sub- 412

strates for oligotrophic taxa. The shift from copiotrophic to 413

oligotrophic taxa is likely critical for the digestion of 414

recalcitrant substrates in the rumen. 415 Fig. 4 The average percentage identity of CAZymes (n=62,309)

predicted in the RUGs with proteins deposited in the NCBI non- redundant protein database (A) and CAZyme database (B). AAs axillary activity enzymes (n=1305), CBMs carbohydrate-binding modules (n=6249), CEs carbohydrate esterases (n=9965), cohesins (n=168), dockerins (n=387), GHs glycoside hydrolases (n=27,328), GTs glycoside transferases (n=14,731), PLs

polysaccharide lyases (n=1444), and SLHs surface layer homology domains (n=732). Center lines indicate the median value; red trian- gles show mean; boxes show the interquartile range; and whiskers extend to the most extreme data point. The distribution (C) and density (D) of the predicted CAZyme families across 528 RUGs tax- onomically classied at the class level. Q10

(9)

UNCORRECTED PROOF

416 Carbohydrate degrading potential of RUGs

417 In total, 1,185,394 protein coding genes were predicted in

418 the 538 RUGs, of which 5.1% were annotated as CAZymes,

419 including 27,328 (43.8%) identified as GHs, 1444 as PLs

420 (2.3%), 9965 as CEs (16%), 6249 as CBMs (10%), 387 as

421 dockerin (0.6%), 168 as cohesin (0.26%), 1305 as AAs

422 (2.1%), 732 as SLHs (1.17%), and 14,731 as GTs (23.6%)

423 (Supplementary Data 4). A comparison to the number of

424 sequences annotated as CAZyme in the GTDB representa-

425 tive genomes (n=31,910) revealed a significantly higher

426 enrichment of CAZymes (5.1% vs. 3.7%) particularly those

427 encoding GHs (43.8% vs. 34.4%), CBMs (10% vs. 7.6%),

428 cohesin (0.26% vs. 0.14%), dockerin (0.6% vs. 0.23%), and

429 SLHs (1.17% vs. 0.92%), and lower enrichment of CEs

430 (16% vs. 18.2%), AAs (2.1% vs. 5.6%), and GTs (23.6%

431 vs. 30.8%) in our RUGs. Thisfinding suggests that rumen

432 microbes have been naturally selected and evolutionary

433 adapted with a high genomic potential for efficient lig-

434 nocellulose degradation in the rumen.

435 The top 10 most abundant GH families were GH3,

436 GH31, and GH97 glucosidases, GH2 galactosidases,

437 GH109 α-N-acetylgalactosaminidases, GH28 poly-

438 galacturonases, GH23 and GH25 lysozymes, GH51 endo-

439 glucanases, and GH127 arabinofuranosidases. Pectin

440 degrading enzymes including PL11 rhamnogalacturonan

441 endolyases, PL1_2 pectate lyases, CE1 acetyl xylan

442 esterases, and CE10 arylesterases were among the most

443 abundant PL and CE families. Only 418 CAZymes showed

444 100% identity while 3089 (4.96%) shared ≥95% identity

445 with proteins in the NCBI-NR database (Fig.4A). The AA

446 domain-containing proteins showed the highest (∼74%) but

447 dockerin domain-containing proteins had the lowest

448 (∼45%) average amino acid identity with proteins in the

449 database. The average sequence identities were even lower

450 when compared to CAZyme database (Fig.4B), suggesting

451 that the majority of the predicted CAZymes are novel, with

452 no well-characterized homolog in the current databases.

453 On average, >94% of CAZymes were encoded by seven

454 bacterial classes including Bacteroidia (48.9%), Clostridia

455 (33.6%), Spirochaetia (3.9%), Kiritimatiellae (3.1%),

456 Bacilli (2.4%), Lentisphaeria (1.7%), and Fibrobacteria

457 (1.2%; Fig. 4C). However, normalizing the number of

458 predicted CAZymes to the number of RUGs identified in

459 each taxon (class level) revealed that Kiritimatiellae (phy-

460 lum Verrucomicrobia), followed by Lentisphaeria, Fibro-

461 bacteria, and Bacteroidia, were mainly enriched for GHs,

462 CEs, PLs, and CBMs (Fig.4D), suggesting Kiritimatiellae,

463 Lentisphaeria, and Fibrobacteria to be the likely key lig-

464 nocellulose degraders in the rumen regardless of their lower

465 abundance compared with Bacteroidia and Clostridia. The

466 capacity to degrade hemicellulose is widely encoded across

467 the members of Verrucomicrobia [60]. Previous single-cell

genome sequencing of certain phylotypes of Verrucomi- 468

crobia also revealed the enrichment of genes encoding for 469

CAZymes, sulfatases, and peptidases in their genomes, 470

potentially allowing these bacteria to degrade diverse car- 471

bohydrate polymers [61]. 472

The production of SLH domain-containing proteins was 473

mainly restricted to Clostridia, Vampirovibrionia, Negati- 474

vicutes, and Synergistia. These proteins anchor cellulosome 475

complexes to the cell surface and thus facilitate access to the 476

degraded products from the complexes by the bacterium. 477

Among CAZymes, cohesin and dockerin domain encoding 478

genes were mainly encoded by Bacteroidia and Clostridia. 479

However, genes encoding for these two proteins mainly co- 480

occurred in members of Oscillospirales and certain Chris- 481

tensenellales known as main cellulosome producers in the 482

rumen [62]. The key feature of cellulosome complexes is 483

the multimodular scaffoldin proteins containing multiple 484

tandemly repeated cohesin domains, serving as docking site 485

for the attachment of dockerin-bearing CAZymes [62–64]. 486

In addition, they bear cellulose-binding or carbohydrate- 487

binding modules to attach lignocellulosic substrates. Further 488

analysis of proteins with such configuration revealed 36 489

proteins in 29 RUGs with 2 (n=20) or more (n=16) 490

cohesin domains (Supplementary Data 5). Proteins with 491

greater than four cohesin domains were only detected in the 492

families Ruminococcaceae (7 RUGs) and unknown Chris- 493

tensenellales (2 RUGs). In particular, two Ruminococcus 494

genomes of RUG452 and RUG21 each encoded one 495

ortholog protein with 10 and 5 tandemly repeated cohesin 496

domains, respectively. Based on a BLAST search, these two 497

proteins matched to the scaffoldin B from Ruminococcus 498

flavefaciens with 51.4 and 52.3% identity, respectively, 499

suggesting them to be novel candidate scaffoldins (Fig. S4). 500

In cellulosome complexes, lignocellulose degrading 501

enzymes bind via their dockerin domains to cohesin 502

domains in the scaffoldin proteins, these cohesin–dockerin 503

interactions are fundamental to cellulosome assembly 504

[64, 65]. We identified 126 proteins in 31 RUGs which 505

contained at least one dockerin and one catalytic CAZyme 506

module to be the putative components of cellulosome 507

complexes as they potentially encoded acetyl xylan estera- 508

ses (families CE1, CE2, CE3, CE4, and CE6),α-amylases 509

(GH13, GH13_15, GH13_28, GH13_6, GH13_7, and 510

GH97), xylanases (GH10, GH11, GH16, GH30_8, 511

GH5_21, GH5_35, and GH98), cellulases (GH124, GH128, 512

GH5_2, GH5_4, GH5, and GH9), β-galactanases (GH2, 513

GH53, GH43_24, and GH30_5), and pectate lyases (PL1, 514

PL1_2, PL1_8, PL11, PL9_4, and PL9). The presence of 515

dockerin-bearing α-amylases implies the existence of 516

cohesin-based amylosomes, which are known to be formed 517

in certain members ofRuminococcusincludingR. bromiiin 518

human gut in response to resistant starches [66, 67]. In 519

addition, arabinanases (GH43_37 and GH43_4), L- 520

(10)

UNCORRECTED PROOF

521 arabinofuranosidases (GH43_10, GH43_29, and GH127)

522 and α-L-fucosidases (GH95 and GH141) were also repre-

523 sented among the dockerin-bearing enzymes. Many of these

524 proteins also contained CBMs capable of binding to either

525 cellulose, xylan, starch or galactans, highlighting the key

526 role of cellulosomes in lignocellulosic polymer degradation

527 in the rumen [25].

528 The process of plant polysaccharide degradation by

529 rumen microbiota requires lignocellulose degrading

530 enzymes being secreted into the external milieu. Inspection

531 of CAZyme sequences for the presence of secreted signal

532 revealed that a significant portion of predicted CAZymes in

533 RUGs affiliated to Fibrobacteria (∼53%), Bacteroidia

534 (∼50%), and Kiritimatiellae (∼49%) were secreted and/or

535 periplasmic proteins. While for non-CAZyme sequences

536 only 9.8%, 11.9%, and 14.4% of sequences in Fibro-

537 bacteria, Bacteroidia, and Kiritimatiellae contained signal

538 peptide, respectively, indicating a significant enrichment of

539 signal peptide in CAZymes and thus the critical role of

540 these taxa in lignocellulose degradation in the rumen

541 (see Supplementary Note and Supplementary Data 6 for

542 greater details). Even though that Clostridia were among

543 abundant members of the rumen community, only∼20% of

544 their CAZymes were predicted to be secreted, suggesting

545 that they have been largely tailored to break down cello-

546 biose products of polysaccharide degradation inside of

547 the cell.

548 Classifying RUGs according to their polysaccharide

549 degrading capacities showed that Bacteroidota and Firmi-

550 cutes can potentially degrade a wide range of carbohydrate

551 polymers (Supplementary Data 1), which were also found in

552 different ruminants including camel [26], moose [25, 68]

553 and cattle [22,24]. Fibrobacterota were particularly adapted

554 to degrade glucan, cellulose, mannan, arabinan, xylan,

555 xyloglucan, chitin, starch, and pectin but lacked genes for

556 laccase and those for fucose and rhamnose cleavage. The

557 ability to deconstruct pectin seemed to be widely present

558 among the members of Ruminococcus, Fibrobacteres, and

559 Bacteroidetes, consistent with the finding in moose [25].

560 Both Verrucomicrobiota and Spirochaetota also showed

561 high lignocellulose degrading capacity but the former

562 lacked genes for laccase which are required for lignin

563 metabolism while the latter didn’t demonstrate their

564 potential to break down focuse, rhamnose, and mannan. In

565 termite hindgut microbiota, Spirochaetota were found to

566 play a key role in hemicellulose degradation as they carry

567 genes for targeting cellulose-xyloglucan polysaccharide

568 complexes [53, 69]. In the rumen microbiome, however,

569 Spirochaetota were shown to be associated with pectin

570 degradation, particularly under a pectin rich diet [25,70].

571 Treponema (Spirochaetota), Fibrobacter, Clostridia, and

572 Bacteroidia widely encoded enzymes involved in chitin

573 metabolism (domain DUF4849 and GH18; Supplementary

Data 1). Since chitin is not a component of plant cell wall, 574

the presence of chitin degrading enzymes is likely required 575

for the degradation of host-derived glycoproteins such as 576

mucin [71, 72]. Only single Verrucomicrobium bin 577

(RUG64) in this study had genes for chitin degradation 578

while in human gut, Verrucomicrobia including Akker- 579

mansia muciniphila are known as mucin-degraders [73]. 580

The remaining RUGs including those belonging to Syner- 581

gistota, Proteobacteria, Elusimicrobiota, Desulfobacterota, 582

Chloroflexota, and archaea had limited polysaccharide 583

degrading capacities, yet they may play an accessory or 584

complementary role in carbohydrate degradation and 585

metabolism in the rumen. The variability in lignocellulose 586

degrading capacity across different lineages may enable 587

bacteria to toleratefluctuations in physicochemical proper- 588

ties of the forages and to ensure nutrient supply from dif- 589

ferent lignocellulosic substrates in the rumen. 590

The organization of CAZymes in PULs in 591

Bacteroidota 592

In Bacteroidota, all genes required for perception, cleavage, 593

and utilization of a specific polysaccharide glycan are 594

localized on the same portion of their genomes character- 595

ized by the presence of asusC/Dgene pair which encode for 596

sugar transporters [7]. This special organization of 597

CAZymes is known as PULs. The PUL profiles can be used 598

to predict the ecological niche and substrate specificity of 599

Bacteroidota as the most abundant rumen microbes. To 600

identify potential PULs, RUGs were screened for tandem 601

susC-susD gene pair followed by identification of the 602

occurrence of genes encoding CAZymes around thesusC/D 603

pair [6]. A total of 1669 potential PULs were predicted in 604

154 RUGs (out of the 190 Bacteroidota RUGs), of which 605

708 lacked any known CAZyme gene, consistent with 606

previousfindings (Supplementary Data 7) [8,68]. The lack 607

of CAZymes in a large set of PULs suggests that PULs may 608

also contribute to molecular processes other than poly- 609

saccharide degradation [8]. The occurrence of CAZymes in 610

predicted PULs are detailed in Supplementary Note. 611

Several abundant and functionally important lig- 612

nocellulose degrading enzymes including families GH11 613

endo-β-1,4-xylanases, GH43_33 α-L-arabinofuranosidases, 614

GH43_8 β-D-galactofuranosidases, GH30_5 endo-β-1,6- 615

galactanases, GH4 and GH5_26 cellulases, and 616

GH59 β-galactosidases were not found in the PULs. In 617

addition, GH9, GH5_2, GH8 cellulases, GH13_38 α-glu- 618

cosidases, GH74 endoglucanases, GH30_2 β-xylosidases, 619

GH35 β-galactosidases, GH29α-L-fucosidases, GH106α- 620

L-rhamnosidases, GH43_29 α-1,2-L-arabinofuranosidases, 621

GH43_35, and GH43_10 β-xylosidases were also rarely 622

present in the PULs. The phenomenon that not all lig- 623

nocellulose degrading enzymes are organized into PULs 624

(11)

UNCORRECTED PROOF

625 suggests that they may have been developed for the

626 decomposition of complex polysaccharides with hetero-

627 geneous structures, which require a combination of

628 enzymes for their degradation. Similar to our finding, no

629 PUL for the digestion of crystalline cellulose, which is

630 largely composed of β-1-4 linked D-glucopyranose units,

631 has been reported [8]. This is also the case for starch and

632 glycogen made of glucose subunits linked by eitherα-1-4 or

633 α-1-6 glycosidic bonds, for which the degrading enzymes

634 were rarely encountered in the PULs. The predicted PULs

635 appeared to target complex polysaccharide polymers

636 including xyloglucans, glucuronylxylans, and pectins,

637 which are the main components of plant cell wall, but rarely

638 accommodated enzymes dedicated to break down glucan

639 homopolymers including crystalline cellulose, starch, and/

640 or glycogen. Overall, these findings suggest a prominent

641 diversity of PULs predicted in Bacteroidota genomes,

642 indicating their functional specialization toward specific

643 lignocellulosic substrates.

644 Metabolic classification of RUGs

645 We classified all RUGs according to their capacity for

646 producing main SCFAs and utilizing monomeric carbohy-

647 drates (Supplementary Data 1). The production of butyrate,

648 lactate, acetate, succinate, and propionate was widespread

649 among Bacteroidota, Elusimicrobiota, and Fibrobacterota.

650 Within Firmicutes, however, there was a significant varia-

651 tion in the potential to ferment SCFAs. While lactate pro-

652 duction was widely encoded by this phylum, however, only

653 orders of Lachnospirales, Erysipelotrichales, Peptos-

654 treptococcales, and Saccharofermentanales can potentially

655 ferment butyrate (Supplementary Data 1). In particular,

656 Lachnospirales are known as key rumen butyrate producers,

657 which redirect hydrogen from reducing CO2 for methane

658 formation to butyrate production and thus may contribute to

659 an enhanced feed efficiency in Holstein Friesian dairy cows

660 [19]. Acetate production was common across rumen

661 microbiota except Thermoplasmatota (Archaea), Cyano-

662 bacteria, and Negativicutes (Firmicutes_C), which lacked

663 genes to encode pta and acka. Acetate production is also

664 widespread among rumen microbiota of moose [68] and

665 dromedary camel [26]. Succinate and propionate production

666 were widely encoded by Firmicutes, except some members

667 of Bacilli including orders of RF39 and RFN20 lacking the

668 required enzymes. Particularly, RF39 and RFN20 are the

669 two major gut lineages characterized by their reduced

670 genome size, mixotrophic lifestyle and inability to synthe-

671 size peptidoglycans as an adaptive strategy in the gut

672 environment [74]. In ruminant animals, a greater potential

673 for propionate, butyrate, and propionate-to-acetate fermen-

674 tation are known to associate with a higher feed efficiency

675 but also with a lower methane emission [19,75].

In addition, the PPP, EMP, and pyruvate metabolism 676

pathways were widely distributed across the RUGs. Certain 677

members of the rumen microbiota including some strains of 678

Lachnospira, Prevotella, Butyrivibrio, Bacteroidaceae, and 679

Clostridia together with all Kiritimatiellae and Gastranaer- 680

ophilaceae (Cyanobacteria) lack enolase, the enzyme that 681

catalyze the penultimate step of the glycolysis pathway [3]. 682

The evolutionary advantage of enolase loss is currently 683

unknown. Energy metabolism through electron transport 684

complexrnfABCDEG was widespread across Bacteroidota 685

and Firmicutes (Supplementary Data 1), corroborating the 686

finding in the moose rumen microbiota [68]. However, 687

echABCDEF complex was the primary route for energy 688

metabolism in Fibrobacterota, Desulfobacterota, Negativi- 689

cutes (Firmicutes_C), and Thermoplasmatota, in which 690

evidence for rnfABCDEG complex was scarce. The co- 691

occurrence of the two complexes was rare in some Firmu- 692

cutes genomes. Both Rnf and Ech are membrane-bond 693

enzyme complexes which couple electron transfer with an 694

H+/Na+gradient to recycle NADH and H2[76]. These data 695

suggest that forage-attached microbiota show a high 696

redundancy with respect to metabolic capabilities and their 697

functions in the rumen. This functional redundancy is likely 698

beneficial to the host animal as it enables the host to extract 699

energy from diverse lignocellulosic substrates. 700

Conclusion

701

Our metagenomics data indicated that the rumen environ- 702

ment harbors a wealth of microbial diversity yet to be dis- 703

covered, with several rare bacterial taxa playing key roles in 704

fiber digestion. We showed that while generalist microbes 705

and their genes are prevalent in the rumen, certain groups 706

are specialized toward particular lignocellulosic substrates. 707

In addition, we provided evidence for a strong temporal 708

shift in the rumen microbial function from copiotrophy to 709

oligotrophy along the length of rumen incubation, which 710

may facilitate the degradation of recalcitrant lignocellulosic 711

substrates in the rumen. The physiochemical properties of 712

forages and the retention time of feed particulates in the 713

rumen were the major factors determining the extent of 714

copiotrophic–oligotrophic shifts. Taken together, the dif- 715

ferential colonization of forages with different lig- 716

nocellulosic properties demonstrates that the diversity of 717

rumen microbiota is likely critical to the digestion of dif- 718

ferent lignocellulosic substrates, reflecting a strong niche 719

complementarity between members of the rumen microbial 720

community. The high taxonomic diversity, functional 721

redundancy and metabolic partitioning are the primary 722

characteristics of the rumen microbiome that facilitate the 723

evolutionary adaptation to diverse lignocellulosic forages. 724

Referensi

Dokumen terkait