Metagenomic analysis reveals a dynamic microbiome with diversified
adaptive functions to utilize high lignocellulosic forages in the cattle rumen
Article in The ISME Journal · March 2021
DOI: 10.1038/s41396-020-00837-2
CITATION
1
READS
139 6 authors, including:
Some of the authors of this publication are also working on these related projects:
Analysis of Hematological Traits in Polled Yak by Genome-Wide Association Studies Using Individual SNPs and HaplotypesView project
Genomic diversityView project Javad Gharechahi
Baqiyatallah University of Medical Sciences 42PUBLICATIONS 600CITATIONS
SEE PROFILE
S.M. Farhad Vahidi
Agricultural Biotechnology Research Institute of Iran, North branch, Rasht 30PUBLICATIONS 356CITATIONS
SEE PROFILE
Han Jianlin
Consultative Group on International Agricultural Research 237PUBLICATIONS 2,790CITATIONS
SEE PROFILE
All content following this page was uploaded by Han Jianlin on 23 November 2020.
The user has requested enhancement of the downloaded file.
UNCORRECTED PROOF
1 A R T I C L E
2
Metagenomic analysis reveals a dynamic microbiome with
3
diversi fi ed adaptive functions to utilize high lignocellulosic forages
4
in the cattle rumen
5 Javad Gharechahi1●Mohammad Farhad Vahidi2●Mohammad Bahram3,4●Jian-Lin Han 5,6●Xue-Zhi Ding 7●
6 Ghasem Hosseini Salekdeh 2,8
7 Received: 23 May 2020 / Revised: 3 November 2020 / Accepted: 11 November 2020 8 © The Author(s), under exclusive licence to International Society for Microbial Ecology 2020
9 Abstract
10 Rumen microbiota play a key role in the digestion and utilization of plant materials by the ruminant species, which have
11 important implications for greenhouse gas emission. Yet, little is known about the key taxa and potential gene functions
12 involved in the digestion process. Here, we performed a genome-centric analysis of rumen microbiota attached to six
13 different lignocellulosic biomasses in rumen-fistulated cattle. Our metagenome sequencing provided novel genomic insights
14 into functional potential of 523 uncultured bacteria and 15 mostly uncultured archaea in the rumen. The assembled genomes
15 belonged mainly to Bacteroidota, Firmicutes, Verrucomicrobiota, and Fibrobacterota and were enriched for genes related to
16 the degradation of lignocellulosic polymers and the fermentation of degraded products into short chain volatile fatty acids.
17 We also found a shift from copiotrophic to oligotrophic taxa during the course of rumen fermentation, potentially important
18 for the digestion of recalcitrant lignocellulosic substrates in the physiochemically complex and varying environment of the
19 rumen. Differential colonization of forages (the incubated lignocellulosic materials) by rumen microbiota suggests that
20 taxonomic and metabolic diversification is an evolutionary adaptation to diverse lignocellulosic substrates constituting a
21 major component of the cattle’s diet. Our data also provide novel insights into the key role of unique microbial diversity and
22 associated gene functions in the degradation of recalcitrant lignocellulosic materials in the rumen.
23
Introduction
24 As the main sources of energy for their growth and lacta-
25 tion, ruminants depend largely on the end products of the
forage digestion, i.e., energy accessible short chain fatty 26
acids (SCFAs) [1]. These animals lack enzymes that are 27
required to break down the complex lignocellulosic carbo- 28
hydrates as the major component of the forage [2,3]. For 29
this, they rely on microbial communities inhabiting their 30
gastrointestinal tract particularly the rumen. Rumen 31
microbes are thus equipped with diverse carbohydrate 32
active enzymes (CAZymes) for breaking down the complex 33
* Xue-Zhi Ding [email protected]
* Ghasem Hosseini Salekdeh [email protected]
1 Human Genetics Research Center, Baqiyatallah University of Medical Sciences, Tehran, Iran
2 Department of Systems Biology, Agricultural Biotechnology Research Institute of Iran, Agricultural Research, Education, and Extension Organization, Karaj, Iran
3 Department of Ecology, Swedish University of Agricultural Sciences, Ulls väg 16, 756 51 Uppsala, Sweden
4 Institute of Ecology and Earth Sciences, University of Tartu, 40 Lai St, 51005 Tartu, Estonia
5 Livestock Genetics Program, International Livestock Research Institute (ILRI), 00100 Nairobi, Kenya
6 CAAS-ILRI Joint Laboratory on Livestock and Forage Genetic Resources, Institute of Animal Science, Chinese Academy of Agricultural Sciences (CAAS), 100193 Beijing, China
7 Key Laboratory of Yak Breeding Engineering, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences (CAAS), 730050 Lanzhou, China
8 Department of Molecular Sciences, Macquarie University, North Ryde, NSW, Australia
Supplementary informationThe online version of this article (https://
doi.org/10.1038/s41396-020-00837-2) contains supplementary material, which is available to authorized users.
1234567890();,: 1234567890();,:
UNCORRECTED PROOF
34 structure of plant cell-wall materials. They produce a
35 complex cocktail of enzymes for the efficient degradation of
36 plant polysaccharides, including diverse glycoside hydro-
37 lases (GHs), carbohydrate esterases (CEs), and poly-
38 saccharide lyases (PLs) together with their accessory non-
39 catalytic carbohydrate-binding modules (CBMs), enabling
40 their targeted attachment to lignocellulosic substrates.
41 However, little is known about specific adaptations of
42 microbes to the complex and dynamic rumen environment.
Q1 Q2Q3Q4
43 Rumen microbes utilize different mechanisms to invade
44 complex plant polysaccharide substrates and thus there may
45 be some specialization in substrate types utilized and
46 functional capabilities across different members of the
47 rumen microbial community. Some rumen bacteria includ-
48 ing species of Ruminococcus have evolved multienzyme
49 assemblies, the cellulosomes, in which multiple CAZymes
50 bind via dockerin domains to cell surface localized scaf-
51 folding proteins containing repeated cohesin domains [4].
52 These cohesin–dockerin interactions will enable a concerted
53 action of multiple different CAZymes with varying sub-
54 strate specificities on lignocellulosic substrates and thus
55 enhance the degradation of recalcitrant polysaccharides,
56 e.g., cellulose and hemicellulose, by orders of magnitude
57 [5]. However, Bacteroidetes as abundant members of the
58 rumen microbiome, lack the cellulosome. In these bacteria,
59 all genes required
Q5 for utilizing specific glycan substrate are
60 physically localized on the same portion of their genomes
61 featured by the presence of a susC/D gene pair. These
62 special arrangements of CAZymes are known as poly-
63 saccharide utilization loci (PULs) [6,7]. Since each PUL is
64 dedicated to the break down of specific complex glycan
65 substrate, the combination of enzymes present in PULs
66 determines the carbohydrate degrading capacity of the
67 bacteria [8].
68 In addition, to degrade plant materials efficiently, rumen
69 microbes must be able to colonize and to attach feed par-
70 ticulates. Particle-associated microbes that are embedded in
71 a biofilm matrix play central role in rumen digestion. The
72 attachment of rumen microbes to feed particles promotes
73 their prolonged residence in the rumen and enables their
74 enzymes to be in contact with the substrates [9–11]. Bac-
75 terial attachment to feed particles is initiated within 5 min of
76 feed entry and gets established within 1 h of rumen incu-
77 bation [11, 12]. In this regard, the physicochemical prop-
78 erties of feeds proved to be the major factors determining
79 microbial colonization and digestion in the rumen [12–15].
80 Particularly, rumen microbial communities are distinct
81 between forage-fed and concentrate-fed animals [16, 17].
82 Despite these advances, we still have little information
83 about the diversity of rumen microbiota colonizing forages
84 of different lignocellulosic compositions as well as their
85 dynamic changes along the digestion process in the rumen.
The rumen microbial communities contribute to varia- 86
tions in feed efficiency in the host [18, 19]. Several 16S 87
rRNA gene-based metabarcoding studies report that rumen 88
microbes show significant variations with respect to their 89
ability to attach to feed particles [12,14,15,20]. However, 90
little is known about the genomic constituents of particle- 91
attached microbiota and the variations in their CAZyme 92
repertoires which determine their lignocellulose degrading 93
potential in the rumen. Most particle adhering microbes 94
(>70%) are unable to grow in routine culture conditions 95
[21]. Culture-independent metagenomics has enabled us to 96
gain insight into the functional significance of these 97
microorganisms and their contribution to lignocellulose 98
degradation in the rumen. This, together with recent 99
advances in bioinformatics analyses, has now made it 100
possible to recover near complete and draft genome 101
sequences of uncultured rumen bacteria and archaea from 102
metagenome-assembled contigs [22–26]. A recent study, for 103
example, recovered 4941 metagenome-assembled genomes 104
with >80% completeness from the cattle rumen, many of 105
them (>70%) had <95% nucleotide identity with the exist- 106
ing genomes, suggesting their novel entities [22]. Meta- 107
genome analyses have also revealed that rumen microbiota 108
could be a potential source of novel enzymes for bio- 109
technological applications including biofuel, paper, textile, 110
and food industries [23,27]. Knowledge about the diversity 111
and functional capabilities of rumen microbiota may pro- 112
vide opportunities to improve feed digestion efficiency of 113
domestic ruminants through microbiome engineering for an 114
enhanced lignocellulose degrading capacity. 115
Here, we analyzed 333 base pairs (Gbp) of metagenome 116
sequencing data from rumen microbes attached to six dif- 117
ferent lignocellulosic biomasses following their in situ 118
incubation in the cattle rumen. Our aim was to determine 119
the key taxa and associated gene functions involved in the 120
digestion of different lignocellulosic substrates and their 121
potential shifts during the digestion process. In our previous 122
16S rRNA gene metabarcoding [15], we showed that rumen 123
microbiota had differential preferences for attachment to 124
forages of varying lignocellulose compositions. Here we 125
demonstrate how this differential colonization is reflected in 126
their genomic potential for the degradation of different plant 127
cell-wall polymers. We also hypothesized that changes in 128
community of forage-attached microbiota along the length 129
of rumen incubation reflect their lignocellulose degrading 130
potential. To evaluate this hypothesis, we sequenced and de 131
novo assembled the whole metagenome of fiber-attached 132
rumen microbiota from three rumen-fistulated cattle and 133
reconstructed the genome sequences of rumen microbiota 134
involved in the break down of different lignocellulosic 135
biomasses with varying physicochemical properties. 136
UNCORRECTED PROOF
137
Material and methods
138 Sample collection
139 The details of three Taleshi bulls, rumen cannulation tech-
140 nique, the physiochemical properties, preparation, and
141 sampling of the six common lignocellulosic forages
142 including camelthorn (AP), common red (CR), date palm
143 (DP), rice straw (RS), Kochia (KS), and Salicornia (SC)
144 have been presented in our previous study [15] and briefly
145 summarized in Supplementary Note and Fig. S1. The
146 experiment was approved by the Ethics Committee for
147 Animal Experiments of the Animal Science Research
148 Institute of Iran.
149 DNA extraction and metagenome sequencing
150 Metagenome DNA extraction from microbial cells tightly
151 attached to the feed particulates was performed as described
152 previously [15, 28]. Metagenome library preparation was
153 performed according to TruSeq DNA Library Preparation
154 Kit (Illumina, San Diego, CA, USA). The quantity of each
155 library was evaluated using a Qubitfluorimeter (Invitrogen,
156 Carlsbad, CA, USA). Metagenome sequencing was per-
157 formed at the Novogene Inc. (Beijing, China) using Illu-
158 mina HiSeq 2500 system with 150 bp paired-end
159 sequencing of ~340 bp fragments.
160 De novo metagenome assembly and binning
161 All metagenome sequences were co-assembled based on
162 all 24 samples across the six forages (co-assembly) using
163 MEGAHIT v1.2.9 [29] with the following options: --k-min
164 31, --k-max 141, --k-step 10, --min-count 2, and --min-
165 contig-len 200. We also assembled reads generated for
166 each forage separately (forage-specific assemblies) using
167 SPAdes v3.13.1 [30] with k-mers 31, 51, 71, 91, 111, and
168 127. To obtain the coverage profiles of assembled contigs,
169 reads were mapped back to the filtered contigs (e.g.,
170 contigs longer than 2000 bp) using Bowtie2 v2.3.5 [31].
171 Samtools v1.9 [32] was used to convert and sort BAM
172 files. Genome bins were reconstructed using MetaBat2
173 v2.14 [33] and MaxBin2 v2.2.6 [34] on the co-assembly
174 and forage-specific assemblies. MetaBat2 was run on the
175 co-assembly and forage-specific assemblies using the
176 option --minContig 2000
Q6 and considering the coverage
177 profiles across either all 24 samples (for the co-assembly)
178 or the four sampling points (for each forage-specific
179 assembly). MaxBin2 was also run on the co-assembly and
180 forage-specific assemblies using the following parameters:
181 --min_contig_length 2000 and --max_iteration 50.
Q7 All
182 genome bins obtained from the co-assembly and forage-
183 specific assemblies were aggregated and dereplicated
separately using DAS_Tool v1.1.1 [35] in order to ensure 184
that the dereplicated genomes did not have any shared 185
contigs. DAS_Tool was run under default setting using 186
diamond as a search engine. Genome bins remained after 187
the dereplication using DAS_Tool from the co-assembly 188
(646 bins) and forage-specific assemblies (1020 bins) were 189
finally aggregated and dereplicated using dRep v2.3.2 190
[36]. In this dereplication step, checkm v1.0.18 [37] was 191
used to evaluate for genome completeness and con- 192
tamination, and only bins with ≥75% completeness and 193
≤10% contamination were retained. The final cluster 194
representatives n=538 (hereafter called Rumen- 195
Uncultured Genomes, RUGs) were further analyzed in 196
this study. 197
Taxonomy and functional analysis of RUGs 198
Taxonomic classification of RUGs was performed using 199
GTDB-Tk v1.3.0 [38]. PhyloPhlAn v0.97 [39] was used to 200
place RUGs into a phylogenetic tree and to further infer 201
their taxonomic label using >400 universally conserved 202
proteins encoded by 3171 genomes. The closest cultured 203
relatives from the Hungate1000 collection [3] were also 204
included. The tree was visualized using iTol v5 [40], and 205
phylum-level taxonomies imputed by GTDB-Tk were 206
assigned to leaves or tips. To identify novel and known 207
genomes, the RUGs were compared to the Hungate1000 208
collection (n=408) and the rumen metagenome-assembled 209
genomes (n=4941) from Stewart et al. [22] using fastANI 210
v1.1 with –fragLen 2000. Coding sequences (CDS) and Q8211
clustered regularly interspaced short palindromic repeats 212
were predicted during Prokka v1.14.0 annotation [41]. 213
tRNAs were predicted using tRNAscan-SE v2.0.6 [42]. 214
Ribosomal RNAs were predicted using barrnap v0.9 215
(https://github.com/tseemann/barrnap). 216
Quantification and differential abundance of RUGs 217
The abundance of RUGs across samples were estimated by 218
mapping reads to binned contigs using quant_bins module 219
from metaWRAP [43]. RUGs were clustered based on their 220
abundance profiles into heatmap using average linkage 221
method and spearman-rank correlation distance matrix in 222
heatmapper [44]. The differentially abundant RUGs 223
between forages and across sampling intervals were iden- 224
tified using DESeq2 [45] with an FDR-corrected p value 225
cutoff < 0.01. 226
Microbial community diversity based on RUGs 227
The microbial community structure of the samples across 228
forages and sampling intervals was explored using Weigh- 229
ted Unifrac distance matrix. Permutational multivariate 230
UNCORRECTED PROOF
231 analysis of variance (PERMANOVA) was used to test for
232 significant differences among the forages and across the
233 sampling intervals using the Adonis function of vegan
234 v2.5–5 [46]. Group-wise differences were tested using
235 pairwise Adonis. PERMDISP was
Q9 also used to test for the
236 homogeneity of dispersions using the beta-disper function
237 of vegan.
238 Identification and annotation of CAZymes in RUGs
239 Prodigal v2.6.3 [47] was used in metagenome mode to
240 predict protein coding genes. The predicted proteins were
241 scanned for candidate CAZymes using hmmscan module
242 from HMMER v3.2.1 and dbCAN database v6 as input
243 [48]. The hmmscan output was parsed by applying a
244 minimum query coverage 35% and an e-value cutoff 1e−5.
245 PULs were identified using PULpy [49]. The predicted
246 CAZymes were BLASTp [50] searched against the NCBI nr
247 database (November 2019) using an e-value cutoff of 1e
248 −30 and a maximum target sequence of 30. Signal peptides
249 were predicted using SignalP v.4.1 [51] based on both
250 gram-positive and gram-negative models.
Metabolic reconstitution of RUGs 251
To infer the metabolic potential of RUGs, the predicted 252
proteome of each RUGs was scanned for enzymes involved 253
in specific metabolic pathways including glycolysis, pen- 254
tose phosphate (PPP), and Embden–Meyerhof–Parnas 255
(EMP). The search was performed using the hmmscan 256
module of HMMER software against Pfam domains 257
representing key marker genes of each metabolic pathway. 258
Pfam domains were retrieved from Pfam v32 and TIGR- 259
FAMs v15 databases (see Supplementary Data 1 for the 260
details of pfam domains). The ability to metabolize mono- 261
meric carbohydrates including mannose, rhamnose, xylose, 262
galactose, arabinose, fucose, and glucose requires the pre- 263
sence of key kinase, isomerase, and epimerase enzymes 264
catalyzing critical steps in their metabolism as carbon 265
source. 266
The ability of RUGs to ferment carbohydrate monomers 267
into SCFAs was also determined. Butyrate production was 268
approved by the presence of genes for phosphate butyryl- 269
transferase (ptb), butyrate kinase (buk), and butyryl-CoA: 270
acetate CoA-transferase (but). Lactate dehydrogenase (ldh) 271 Fig. 1 The majority of the reconstituted rumen-uncultured gen-
omes (RUGs) are novel and do not have any representatives in the publicly available databases. A The average nucleotide identity (ANI) and aligned fraction (AF) of the 538 RUGs compared with rumen cultured genomes from the Hungate1000 genome project [3], the cattle rumen metagenome-assembled genomes [22], and the GTDB-Tk representative genomes (release 95) [38]. The horizontal dashed line (purple) shows accepted AF threshold (65%), vertical
dashed lines (green and blue) show 95% and 99% ANI thresholds, respectively.BThe phylogenetic tree of the 538 RUGs along with 62 closest relatives from the Hungate1000 collection. The tree was con- structed based on the concatenated multiple sequence alignments of 400 universally conserved proteins using PhyloPhlAn [39] and visualized using iTOL [40]. Clades are colored based on phylum-level taxonomies inferred by GTDB-Tk [38]. The Hugnate1000 repre- sentative genomes are labeled by their taxonomic names.
UNCORRECTED PROOF
272 was used as the marker of lactate production. Acetate pro-
273 duction was checked by the presence of genes encoding
274 phosphate acetyltransferase (pta) and acetate kinase (ackA).
275 Malic enzyme (men), fumarate hydratase (fum), and fuma-
276 rate reductase (frd) were used as markers of succinate
277 production. Propionate production was also approved by the
278 presence of three key enzymes methylmalonyl-CoA dec-
279 arboxylase (mmdA), methylmalonyl-CoA mutase (mutA),
280 and methylmalonyl-CoA epimerase (mce). The potential of
281 degradation of carbohydrate polymers was further evaluated
282 by enumerating key enzymes for the degradation of cellu-
283 lose, xylan, xyloglucan, chitin, pectin, and starch.
284
Results and discussion
285 Metagenome assembly and genome bin selection
286 In this study, we sequenced the whole metagenome of 24
287 rumen samples, resulted in a total of 333 Gbp of sequencing
data. A combined coverage and tetra nucleotide frequency- 288
based binning resulted in 538 RUGs having >75% com- 289
pleteness and <10% contamination (Supplementary Data 2 290
and Fig. S2A, B). Among the RUGs, 205 had >90% 291
completeness and <5% contamination and 21 could be 292
considered as high-quality draft MAGs based on the criteria 293
defined by Bowers et al. [52] (completeness > 90%, con- 294
tamination < 5%, and the presence of the 23S, 16S, and 5S 295
rRNA genes and at least 18 tRNAs). In addition, 468 (87%) 296
of RUGs also had a quality score (completeness−5 × 297
contamination) > 50%, indicating most of the RUGs to be 298
high quality. Among the 538 RUGs, 351 had at least 18 out 299
of the 20 standard tRNAs (Fig. S2C) and 53 had at least a 300
full-length and 117 a partial 16S rRNA gene. The sizes of 301
RUGs ranged from 606 kilobases (kb) to 5.95 megabases 302
(Mb), with N50 values from 4 to 387 kb. The smallest RUG 303
was a mycoplasma (RUG208), followed by four Sacchar- 304
imonadaceae bacteria with genome sizes <0.8 Mb and an 305
average 78% genome completeness. The number of contigs 306 Fig. 2 The community
composition offiber-attached microbiota changes with incubation time.Heatmap clustering shows a clear separation of samples collected at 24 h from those at later stages of the rumen incubation. The RUGs were hierarchically clustered based on their relative abundances estimated by recruiting reads to contigs in the binned genomes using an average linkage method and spearman-rank correlation distance measurement using Heatmapper [44]. AP camelthorn, CR common reed, DP date palm, KS Kochia, RS rice straw, and SC Salicornia.
UNCORRECTED PROOF
307 in RUGs ranged from 13 to 703 with an average of 189
308 contigs.
309 By comparing the RUGs with Hungate1000 collection
310 (n=408) [3], the set of RUGs (n=4941) reported by
311 Stewart et al. [22] and the GTDB-Tk representative gen-
312 omes (n=31,910) [38] we found that most of our RUGs
313 were novel and lacked any representative in available
314 databases (Fig.1A). To test whether these low ANI simi-
315 larities are likely resulted from genome incompleteness, we
316 correlated our RUGs completeness and their ANI with
317 Stewart MAGs and found no clear relationship (Fig. S3).
318 Only 92 RUGs (~17%) were reliably assigned to known
319 species with an ANI >95% over ≥65% of alignment frac-
320 tion, congruent with their placement in GTDB-Tk reference
321 tree. Seventy-eight RUGs were remained unannotated at the
322 genus level, six at the family level, and one at the order and
323 the class levels (Supplementary Data 2). A total of 243
324 RUGs (45%) were taxonomically assigned to Firmicutes
325 and 190 (35%) to Bacteroidota (Figs. 1B, S2D). Spir-
326 ochaetota were the third most abundant group of bacteria in
327 26 RUGs (4.8%), followed by Proteobacteria (15), Verru-
328 comicrobiota (12), Synergistota (7), Patescibacteria (6),
329 Elusimicrobiota (5), Cyanobacteria (5), Desulfobacterota
330 (4), and Fibrobacterota (4) (Fig. S2D). The high number of
331 Spirochaetota genomes recovered in this study agreed with
332 an early study on moose rumen, in which Spirochaetota
333 were found to play a major role in pectin degradation [25].
334 All Spirochaetota genomes belonged to the genus Trepo-
335 nema, known as the major hemicellulose degrader in the gut
336 microbiota [53].
337 Overall, 15 RUGs were taxonomically assigned to the
338 archaeal families Methanomethylophilaceae (n=11),
339 Methanobacteriaceae (3), and Methanomicrobiaceae (1).
Among Methanomethylophilaceae, eight RUGs were 340
affiliated to ISO4-G1, a methylotrophic methanogen first 341
isolated from the sheep rumen [54]. Methanobacteriaceae 342
genomes were assigned to the genus Methanosphaera 343
(RUG228) and Methanobrevibacter (RUG389 and 344
RUG508). The single genome recovered from Methano- 345
microbiaceae was assigned to Methanomicrobium mobile, 346
which is one of the major metanogenic archaeon detected in 347
the rumen of sheep (accounting for 54% of the total 348
methanogens) [55] and cattle (>106cells/mL in rumen 349
digesta) [56]. The details of taxonomies inferred by GTDB- 350
Tk are presented in Supplementary Data 2. 351
The differential colonization of forages by RUGs 352
The six lignocellulosic biomasses (AP, CR, DP, KS, RS, 353
and SC) were incubated in three rumen-fistulated bulls for a 354
period of 96 h with 24 h sampling intervals. The samples 355
taken from the three bulls at each sampling interval (e.g., 356
24, 48, 72, and 96 h) were pooled and processed for whole 357
metagenome sequencing of fiber-attached microbiota. 358
Clustering samples at the four sampling intervals and across 359
the six forages based on the relative abundances of the 360
recovered RUGs revealed a clear separation of the samples 361
at 24 h of rumen incubation (Fig. 2). The distribution of 362
RUGs across forages varied as the incubation time was 363
extended. At 48 h, only AP, CR, DP, and SC clustered 364
together. There was no apparent clustering of forages at 72 365
and 96 h, indicating that the digestion during early incu- 366
bation stages converted feed particulates into unfavored 367
substrates for most of the RUGs, a pattern in agreement 368
with our 16S rRNA metabarcoding data [15]. 369 Fig. 3 Phylogenetic diversity offiber-attached RUGs depends on
both feed type and sampling times. APCoA plot shows a significant separation of RUGs based on the forages and following rumen incu- bation lengths. Forages are indicated by colors and sampling intervals with shapes. Statistically significant differences in rumen microbial community attached to the forages and between the incubation lengths
were tested by permutation multivariate analysis of variance (PER- ANOVA). The homogeneity of group dispersions between forages (B) and sampling intervals (C) were tested using the beta-disper function in R package vegan. AP camelthorn, CR common reed, DP date palm, KS Kochia, RS rice straw, and SC Salicornia.
UNCORRECTED PROOF
370 Microbial community diversity and composition ana-
371 lyses also showed significant differences between the
372 forages and sampling intervals (PERMANOVAP< 0.001;
373 Fig. 3), which could be mainly ascribed to community
374 changes in AP compared to CR, RS, and SC (P< 0.05). The
375 distribution of RUGs across sampling intervals also showed
376 a clear separation of 24 h samples (pairwise PERMANOVA
377 P< 0.003) from all other stages which exhibited overlapped
378 clustering (Fig. 3). There were higher relative abundances
379 of Firmicutes (26 RUGs), Fibrobacterota (2 RUGs), Bac-
380 teroidota (2 RUGs), and Spirochaetota (2 RUGs; FDR-
381 corrected P< 0.01) attached to SC (the forage with the
382 highest hemicellulose and the lowest ADF concentrations)
383 compared with other forages. Ruminococcaceae (3 RUGs)
384 were significantly enriched in the microbial community
385 attached to RS (the forage with the lowest levels of ADF
386 and ADL) compared with other forages. Verrucomicrobiota
387 (2 RUGs) were represented by higher abundance in CR (the
388 forage with the highest cellulose content) compared to AP
389 and SC (two forages with the lowest cellulose concentra-
390 tion). Thisfinding indicated the key role of Verrucomicro-
391 biota in cellulose degradation, and was in agreement with a
392 recent analysis of mice microbiota with Bacteroidetes to be
accumulated in the presence of bamboo shoot while Ver- 393
rucomicrobiota to be significantly increased with crystalline 394
cellulose supplementation [57]. 395
Temporal analysis of microbial communities colonizing 396
the forages also revealed the dominances of Bacteroidetes, 397
Firmicutes, and Proteobacteria during initial stage of rumen 398
incubation when maximal DM degradation took place (e.g., 399
24 h), while Verrucomicrobia accumulation started later 400
when most DM got disappeared (Supplementary Data 3). 401
Bacteroidetes, Firmicutes, and Proteobacteria are known as 402
copiotrophic taxa with potentially rapid growth rates and 403
their adaptability to the condition of high carbon availability 404
[58, 59]. On the other hand, oligotrophic bacteria such as 405
Verrucomicrobia are able to grow on low quality and 406
recalcitrant substrates with slow growth rates but a higher 407
substrate utilization efficiency [59]. This is functionally 408
relevant as upon feed entry into the rumen surface acces- 409
sible carbohydrates in feed particulates attract the copio- 410
trophic taxa first, but the digestion during early hours of 411
rumen incubation turns forage residues into favored sub- 412
strates for oligotrophic taxa. The shift from copiotrophic to 413
oligotrophic taxa is likely critical for the digestion of 414
recalcitrant substrates in the rumen. 415 Fig. 4 The average percentage identity of CAZymes (n=62,309)
predicted in the RUGs with proteins deposited in the NCBI non- redundant protein database (A) and CAZyme database (B). AAs axillary activity enzymes (n=1305), CBMs carbohydrate-binding modules (n=6249), CEs carbohydrate esterases (n=9965), cohesins (n=168), dockerins (n=387), GHs glycoside hydrolases (n=27,328), GTs glycoside transferases (n=14,731), PLs
polysaccharide lyases (n=1444), and SLHs surface layer homology domains (n=732). Center lines indicate the median value; red trian- gles show mean; boxes show the interquartile range; and whiskers extend to the most extreme data point. The distribution (C) and density (D) of the predicted CAZyme families across 528 RUGs tax- onomically classified at the class level. Q10
UNCORRECTED PROOF
416 Carbohydrate degrading potential of RUGs
417 In total, 1,185,394 protein coding genes were predicted in
418 the 538 RUGs, of which 5.1% were annotated as CAZymes,
419 including 27,328 (43.8%) identified as GHs, 1444 as PLs
420 (2.3%), 9965 as CEs (16%), 6249 as CBMs (10%), 387 as
421 dockerin (0.6%), 168 as cohesin (0.26%), 1305 as AAs
422 (2.1%), 732 as SLHs (1.17%), and 14,731 as GTs (23.6%)
423 (Supplementary Data 4). A comparison to the number of
424 sequences annotated as CAZyme in the GTDB representa-
425 tive genomes (n=31,910) revealed a significantly higher
426 enrichment of CAZymes (5.1% vs. 3.7%) particularly those
427 encoding GHs (43.8% vs. 34.4%), CBMs (10% vs. 7.6%),
428 cohesin (0.26% vs. 0.14%), dockerin (0.6% vs. 0.23%), and
429 SLHs (1.17% vs. 0.92%), and lower enrichment of CEs
430 (16% vs. 18.2%), AAs (2.1% vs. 5.6%), and GTs (23.6%
431 vs. 30.8%) in our RUGs. Thisfinding suggests that rumen
432 microbes have been naturally selected and evolutionary
433 adapted with a high genomic potential for efficient lig-
434 nocellulose degradation in the rumen.
435 The top 10 most abundant GH families were GH3,
436 GH31, and GH97 glucosidases, GH2 galactosidases,
437 GH109 α-N-acetylgalactosaminidases, GH28 poly-
438 galacturonases, GH23 and GH25 lysozymes, GH51 endo-
439 glucanases, and GH127 arabinofuranosidases. Pectin
440 degrading enzymes including PL11 rhamnogalacturonan
441 endolyases, PL1_2 pectate lyases, CE1 acetyl xylan
442 esterases, and CE10 arylesterases were among the most
443 abundant PL and CE families. Only 418 CAZymes showed
444 100% identity while 3089 (4.96%) shared ≥95% identity
445 with proteins in the NCBI-NR database (Fig.4A). The AA
446 domain-containing proteins showed the highest (∼74%) but
447 dockerin domain-containing proteins had the lowest
448 (∼45%) average amino acid identity with proteins in the
449 database. The average sequence identities were even lower
450 when compared to CAZyme database (Fig.4B), suggesting
451 that the majority of the predicted CAZymes are novel, with
452 no well-characterized homolog in the current databases.
453 On average, >94% of CAZymes were encoded by seven
454 bacterial classes including Bacteroidia (48.9%), Clostridia
455 (33.6%), Spirochaetia (3.9%), Kiritimatiellae (3.1%),
456 Bacilli (2.4%), Lentisphaeria (1.7%), and Fibrobacteria
457 (1.2%; Fig. 4C). However, normalizing the number of
458 predicted CAZymes to the number of RUGs identified in
459 each taxon (class level) revealed that Kiritimatiellae (phy-
460 lum Verrucomicrobia), followed by Lentisphaeria, Fibro-
461 bacteria, and Bacteroidia, were mainly enriched for GHs,
462 CEs, PLs, and CBMs (Fig.4D), suggesting Kiritimatiellae,
463 Lentisphaeria, and Fibrobacteria to be the likely key lig-
464 nocellulose degraders in the rumen regardless of their lower
465 abundance compared with Bacteroidia and Clostridia. The
466 capacity to degrade hemicellulose is widely encoded across
467 the members of Verrucomicrobia [60]. Previous single-cell
genome sequencing of certain phylotypes of Verrucomi- 468
crobia also revealed the enrichment of genes encoding for 469
CAZymes, sulfatases, and peptidases in their genomes, 470
potentially allowing these bacteria to degrade diverse car- 471
bohydrate polymers [61]. 472
The production of SLH domain-containing proteins was 473
mainly restricted to Clostridia, Vampirovibrionia, Negati- 474
vicutes, and Synergistia. These proteins anchor cellulosome 475
complexes to the cell surface and thus facilitate access to the 476
degraded products from the complexes by the bacterium. 477
Among CAZymes, cohesin and dockerin domain encoding 478
genes were mainly encoded by Bacteroidia and Clostridia. 479
However, genes encoding for these two proteins mainly co- 480
occurred in members of Oscillospirales and certain Chris- 481
tensenellales known as main cellulosome producers in the 482
rumen [62]. The key feature of cellulosome complexes is 483
the multimodular scaffoldin proteins containing multiple 484
tandemly repeated cohesin domains, serving as docking site 485
for the attachment of dockerin-bearing CAZymes [62–64]. 486
In addition, they bear cellulose-binding or carbohydrate- 487
binding modules to attach lignocellulosic substrates. Further 488
analysis of proteins with such configuration revealed 36 489
proteins in 29 RUGs with 2 (n=20) or more (n=16) 490
cohesin domains (Supplementary Data 5). Proteins with 491
greater than four cohesin domains were only detected in the 492
families Ruminococcaceae (7 RUGs) and unknown Chris- 493
tensenellales (2 RUGs). In particular, two Ruminococcus 494
genomes of RUG452 and RUG21 each encoded one 495
ortholog protein with 10 and 5 tandemly repeated cohesin 496
domains, respectively. Based on a BLAST search, these two 497
proteins matched to the scaffoldin B from Ruminococcus 498
flavefaciens with 51.4 and 52.3% identity, respectively, 499
suggesting them to be novel candidate scaffoldins (Fig. S4). 500
In cellulosome complexes, lignocellulose degrading 501
enzymes bind via their dockerin domains to cohesin 502
domains in the scaffoldin proteins, these cohesin–dockerin 503
interactions are fundamental to cellulosome assembly 504
[64, 65]. We identified 126 proteins in 31 RUGs which 505
contained at least one dockerin and one catalytic CAZyme 506
module to be the putative components of cellulosome 507
complexes as they potentially encoded acetyl xylan estera- 508
ses (families CE1, CE2, CE3, CE4, and CE6),α-amylases 509
(GH13, GH13_15, GH13_28, GH13_6, GH13_7, and 510
GH97), xylanases (GH10, GH11, GH16, GH30_8, 511
GH5_21, GH5_35, and GH98), cellulases (GH124, GH128, 512
GH5_2, GH5_4, GH5, and GH9), β-galactanases (GH2, 513
GH53, GH43_24, and GH30_5), and pectate lyases (PL1, 514
PL1_2, PL1_8, PL11, PL9_4, and PL9). The presence of 515
dockerin-bearing α-amylases implies the existence of 516
cohesin-based amylosomes, which are known to be formed 517
in certain members ofRuminococcusincludingR. bromiiin 518
human gut in response to resistant starches [66, 67]. In 519
addition, arabinanases (GH43_37 and GH43_4), L- 520
UNCORRECTED PROOF
521 arabinofuranosidases (GH43_10, GH43_29, and GH127)
522 and α-L-fucosidases (GH95 and GH141) were also repre-
523 sented among the dockerin-bearing enzymes. Many of these
524 proteins also contained CBMs capable of binding to either
525 cellulose, xylan, starch or galactans, highlighting the key
526 role of cellulosomes in lignocellulosic polymer degradation
527 in the rumen [25].
528 The process of plant polysaccharide degradation by
529 rumen microbiota requires lignocellulose degrading
530 enzymes being secreted into the external milieu. Inspection
531 of CAZyme sequences for the presence of secreted signal
532 revealed that a significant portion of predicted CAZymes in
533 RUGs affiliated to Fibrobacteria (∼53%), Bacteroidia
534 (∼50%), and Kiritimatiellae (∼49%) were secreted and/or
535 periplasmic proteins. While for non-CAZyme sequences
536 only 9.8%, 11.9%, and 14.4% of sequences in Fibro-
537 bacteria, Bacteroidia, and Kiritimatiellae contained signal
538 peptide, respectively, indicating a significant enrichment of
539 signal peptide in CAZymes and thus the critical role of
540 these taxa in lignocellulose degradation in the rumen
541 (see Supplementary Note and Supplementary Data 6 for
542 greater details). Even though that Clostridia were among
543 abundant members of the rumen community, only∼20% of
544 their CAZymes were predicted to be secreted, suggesting
545 that they have been largely tailored to break down cello-
546 biose products of polysaccharide degradation inside of
547 the cell.
548 Classifying RUGs according to their polysaccharide
549 degrading capacities showed that Bacteroidota and Firmi-
550 cutes can potentially degrade a wide range of carbohydrate
551 polymers (Supplementary Data 1), which were also found in
552 different ruminants including camel [26], moose [25, 68]
553 and cattle [22,24]. Fibrobacterota were particularly adapted
554 to degrade glucan, cellulose, mannan, arabinan, xylan,
555 xyloglucan, chitin, starch, and pectin but lacked genes for
556 laccase and those for fucose and rhamnose cleavage. The
557 ability to deconstruct pectin seemed to be widely present
558 among the members of Ruminococcus, Fibrobacteres, and
559 Bacteroidetes, consistent with the finding in moose [25].
560 Both Verrucomicrobiota and Spirochaetota also showed
561 high lignocellulose degrading capacity but the former
562 lacked genes for laccase which are required for lignin
563 metabolism while the latter didn’t demonstrate their
564 potential to break down focuse, rhamnose, and mannan. In
565 termite hindgut microbiota, Spirochaetota were found to
566 play a key role in hemicellulose degradation as they carry
567 genes for targeting cellulose-xyloglucan polysaccharide
568 complexes [53, 69]. In the rumen microbiome, however,
569 Spirochaetota were shown to be associated with pectin
570 degradation, particularly under a pectin rich diet [25,70].
571 Treponema (Spirochaetota), Fibrobacter, Clostridia, and
572 Bacteroidia widely encoded enzymes involved in chitin
573 metabolism (domain DUF4849 and GH18; Supplementary
Data 1). Since chitin is not a component of plant cell wall, 574
the presence of chitin degrading enzymes is likely required 575
for the degradation of host-derived glycoproteins such as 576
mucin [71, 72]. Only single Verrucomicrobium bin 577
(RUG64) in this study had genes for chitin degradation 578
while in human gut, Verrucomicrobia including Akker- 579
mansia muciniphila are known as mucin-degraders [73]. 580
The remaining RUGs including those belonging to Syner- 581
gistota, Proteobacteria, Elusimicrobiota, Desulfobacterota, 582
Chloroflexota, and archaea had limited polysaccharide 583
degrading capacities, yet they may play an accessory or 584
complementary role in carbohydrate degradation and 585
metabolism in the rumen. The variability in lignocellulose 586
degrading capacity across different lineages may enable 587
bacteria to toleratefluctuations in physicochemical proper- 588
ties of the forages and to ensure nutrient supply from dif- 589
ferent lignocellulosic substrates in the rumen. 590
The organization of CAZymes in PULs in 591
Bacteroidota 592
In Bacteroidota, all genes required for perception, cleavage, 593
and utilization of a specific polysaccharide glycan are 594
localized on the same portion of their genomes character- 595
ized by the presence of asusC/Dgene pair which encode for 596
sugar transporters [7]. This special organization of 597
CAZymes is known as PULs. The PUL profiles can be used 598
to predict the ecological niche and substrate specificity of 599
Bacteroidota as the most abundant rumen microbes. To 600
identify potential PULs, RUGs were screened for tandem 601
susC-susD gene pair followed by identification of the 602
occurrence of genes encoding CAZymes around thesusC/D 603
pair [6]. A total of 1669 potential PULs were predicted in 604
154 RUGs (out of the 190 Bacteroidota RUGs), of which 605
708 lacked any known CAZyme gene, consistent with 606
previousfindings (Supplementary Data 7) [8,68]. The lack 607
of CAZymes in a large set of PULs suggests that PULs may 608
also contribute to molecular processes other than poly- 609
saccharide degradation [8]. The occurrence of CAZymes in 610
predicted PULs are detailed in Supplementary Note. 611
Several abundant and functionally important lig- 612
nocellulose degrading enzymes including families GH11 613
endo-β-1,4-xylanases, GH43_33 α-L-arabinofuranosidases, 614
GH43_8 β-D-galactofuranosidases, GH30_5 endo-β-1,6- 615
galactanases, GH4 and GH5_26 cellulases, and 616
GH59 β-galactosidases were not found in the PULs. In 617
addition, GH9, GH5_2, GH8 cellulases, GH13_38 α-glu- 618
cosidases, GH74 endoglucanases, GH30_2 β-xylosidases, 619
GH35 β-galactosidases, GH29α-L-fucosidases, GH106α- 620
L-rhamnosidases, GH43_29 α-1,2-L-arabinofuranosidases, 621
GH43_35, and GH43_10 β-xylosidases were also rarely 622
present in the PULs. The phenomenon that not all lig- 623
nocellulose degrading enzymes are organized into PULs 624
UNCORRECTED PROOF
625 suggests that they may have been developed for the
626 decomposition of complex polysaccharides with hetero-
627 geneous structures, which require a combination of
628 enzymes for their degradation. Similar to our finding, no
629 PUL for the digestion of crystalline cellulose, which is
630 largely composed of β-1-4 linked D-glucopyranose units,
631 has been reported [8]. This is also the case for starch and
632 glycogen made of glucose subunits linked by eitherα-1-4 or
633 α-1-6 glycosidic bonds, for which the degrading enzymes
634 were rarely encountered in the PULs. The predicted PULs
635 appeared to target complex polysaccharide polymers
636 including xyloglucans, glucuronylxylans, and pectins,
637 which are the main components of plant cell wall, but rarely
638 accommodated enzymes dedicated to break down glucan
639 homopolymers including crystalline cellulose, starch, and/
640 or glycogen. Overall, these findings suggest a prominent
641 diversity of PULs predicted in Bacteroidota genomes,
642 indicating their functional specialization toward specific
643 lignocellulosic substrates.
644 Metabolic classification of RUGs
645 We classified all RUGs according to their capacity for
646 producing main SCFAs and utilizing monomeric carbohy-
647 drates (Supplementary Data 1). The production of butyrate,
648 lactate, acetate, succinate, and propionate was widespread
649 among Bacteroidota, Elusimicrobiota, and Fibrobacterota.
650 Within Firmicutes, however, there was a significant varia-
651 tion in the potential to ferment SCFAs. While lactate pro-
652 duction was widely encoded by this phylum, however, only
653 orders of Lachnospirales, Erysipelotrichales, Peptos-
654 treptococcales, and Saccharofermentanales can potentially
655 ferment butyrate (Supplementary Data 1). In particular,
656 Lachnospirales are known as key rumen butyrate producers,
657 which redirect hydrogen from reducing CO2 for methane
658 formation to butyrate production and thus may contribute to
659 an enhanced feed efficiency in Holstein Friesian dairy cows
660 [19]. Acetate production was common across rumen
661 microbiota except Thermoplasmatota (Archaea), Cyano-
662 bacteria, and Negativicutes (Firmicutes_C), which lacked
663 genes to encode pta and acka. Acetate production is also
664 widespread among rumen microbiota of moose [68] and
665 dromedary camel [26]. Succinate and propionate production
666 were widely encoded by Firmicutes, except some members
667 of Bacilli including orders of RF39 and RFN20 lacking the
668 required enzymes. Particularly, RF39 and RFN20 are the
669 two major gut lineages characterized by their reduced
670 genome size, mixotrophic lifestyle and inability to synthe-
671 size peptidoglycans as an adaptive strategy in the gut
672 environment [74]. In ruminant animals, a greater potential
673 for propionate, butyrate, and propionate-to-acetate fermen-
674 tation are known to associate with a higher feed efficiency
675 but also with a lower methane emission [19,75].
In addition, the PPP, EMP, and pyruvate metabolism 676
pathways were widely distributed across the RUGs. Certain 677
members of the rumen microbiota including some strains of 678
Lachnospira, Prevotella, Butyrivibrio, Bacteroidaceae, and 679
Clostridia together with all Kiritimatiellae and Gastranaer- 680
ophilaceae (Cyanobacteria) lack enolase, the enzyme that 681
catalyze the penultimate step of the glycolysis pathway [3]. 682
The evolutionary advantage of enolase loss is currently 683
unknown. Energy metabolism through electron transport 684
complexrnfABCDEG was widespread across Bacteroidota 685
and Firmicutes (Supplementary Data 1), corroborating the 686
finding in the moose rumen microbiota [68]. However, 687
echABCDEF complex was the primary route for energy 688
metabolism in Fibrobacterota, Desulfobacterota, Negativi- 689
cutes (Firmicutes_C), and Thermoplasmatota, in which 690
evidence for rnfABCDEG complex was scarce. The co- 691
occurrence of the two complexes was rare in some Firmu- 692
cutes genomes. Both Rnf and Ech are membrane-bond 693
enzyme complexes which couple electron transfer with an 694
H+/Na+gradient to recycle NADH and H2[76]. These data 695
suggest that forage-attached microbiota show a high 696
redundancy with respect to metabolic capabilities and their 697
functions in the rumen. This functional redundancy is likely 698
beneficial to the host animal as it enables the host to extract 699
energy from diverse lignocellulosic substrates. 700
Conclusion
701Our metagenomics data indicated that the rumen environ- 702
ment harbors a wealth of microbial diversity yet to be dis- 703
covered, with several rare bacterial taxa playing key roles in 704
fiber digestion. We showed that while generalist microbes 705
and their genes are prevalent in the rumen, certain groups 706
are specialized toward particular lignocellulosic substrates. 707
In addition, we provided evidence for a strong temporal 708
shift in the rumen microbial function from copiotrophy to 709
oligotrophy along the length of rumen incubation, which 710
may facilitate the degradation of recalcitrant lignocellulosic 711
substrates in the rumen. The physiochemical properties of 712
forages and the retention time of feed particulates in the 713
rumen were the major factors determining the extent of 714
copiotrophic–oligotrophic shifts. Taken together, the dif- 715
ferential colonization of forages with different lig- 716
nocellulosic properties demonstrates that the diversity of 717
rumen microbiota is likely critical to the digestion of dif- 718
ferent lignocellulosic substrates, reflecting a strong niche 719
complementarity between members of the rumen microbial 720
community. The high taxonomic diversity, functional 721
redundancy and metabolic partitioning are the primary 722
characteristics of the rumen microbiome that facilitate the 723
evolutionary adaptation to diverse lignocellulosic forages. 724