Chapter IV: Flux exponent control in metabolism: biological regulation as control of
4.1 Introduction
Microbial communities, from the gut microbiome to the soil rhizosphere, play a critical role in human health and well-being [67,74,82,100]. In particular, we now understand that many pathologies are associated with undesired variations in the composition of the microbiome community [18,101,120]. It is therefore important that we understand the principles governing the dynamics of microbial community structure.
Despite the wealth of information cataloguing the composition of these communities, we are still far from understanding the governing principles of community dynamics. This is because the microbiomes relevant to human health and agriculture are extremely complex, consisting of many interactions across hierarchical spatiotemporal scales [68,110]. The combinatorial space of possible interactions is so large that a purely phenomenological approach cannot succeed without a rigorous theoretical framework in which to interpret the data. It is therefore unsurprising that many microbiome scientists feel that they are
‘drowning in data’ [95, 114], as there is no unified conceptual framework in which to contextualize any given observation.
A theoretical framework that aims to provide such a conceptual basis, as well as to make predictions that can drive further understanding, must span interactions across hierarchical scales, from the molecular to the cellular to the ecological. In microbial communities,
Figure 4.1Diagram showing knowledge of biological systems split into mechanisms and phenotypes, and how they are mapped to each other. Mechanisms are system properties not varying for the timescale of concern, while phenotypes are system properties that are varying. From our knowledge about mechanisms, scientific rules can be summarized, often using the language of mathematics, to capture the core mechanistic structures.
Such core structures can be used to systematically create models from knowledge about mechanisms. By analysis or simulation of these models we can demonstrate that a given mechanism is sufficient for phenotypes it exhibits. To map phenotypes back to mechanisms, mathematical abstractions for the class of systems is needed since phenotypes are behaviors on the system level. Systems theory captures the core structures on the system level, and derive hard limits or laws for given phenotypes. Such laws can then be used to capture necessary conditions on mechanisms for given phenotypes, providing a map via necessity in the reverse direction from phenotypes to mechanisms.
metabolism is the core process that bridges these scales. As an example, Terence Hwa and coworkers have used simple metabolic models to quantitatively explain dynamical phenomena in the growth of single-strain populations encountering various types of nutrient stress [42]. Recently, [66] showed that metabolic interactions can explain a bistable phenotype in a microbial community. Therefore, aiming towards understanding principles of microbial communities, we need a theoretical framework to understand the rules of metabolic regulation in a systematic fashion.
To understand rules of metabolic regulation, we could begin with modeling its dynamics.
Metabolism dynamics is fundamentally hard to describe using the traditional mechanistic model approach such as Michaelis-Menten, where explicit equations are written down with many parameters to be identified through experiments. The complication is that while we can experimentally measure bulk metabolic fluxes at scale and characterize metabolite stoichiometry robustly, we lack systematic ways to observe the dynamic fluxes of intermediates in cells which depend on the concentrations of regulatory proteins [7].
To elaborate on the generality of this difficulty, we can consider our knowledge about biological systems as consisting of two types: mechanisms and phenotypes (see Figure
4.1), split according to timescale for a problem of concern. Mechanisms are physical and biochemical knowledge about a system that do not vary at the timescale of concern.
Typical examples are size of a cell, elasticity of a tissue, atomic composition and structure of a protein molecule, and binding free energy of a receptor-ligand pair. Phenotypes are behaviors of a particular system in a specific scenario, which varies at the timescale of concern. Most experimental observations on an overall system is of this type, such as metabolic flux, metabolite concentration, gene expression profile, and enzyme numbers in a cell. I quickly emphasize that the split between mechanism and phenotype depends on the timescale of concern, with varying properties considered as phenotypes and non-varying ones as mechanisms. For example, cell size is mechanistic information for metabolism on minutes time scale, but it is a phenotype for cell growth or differentiation on hours to days time scale.
We usually determine that something about a specific system is understood when a phenotype can be clearly mapped to some mechanisms, and the mechanisms can clearly explain phenotypes. In other words, a necessary and sufficient (or if and only if) correspon- dence between phenotypes and mechanisms is established. Experimental investigations that connects the mechanisms and phenotypes, as in mechanistic perturbations such as gene knockouts, can provide a point-to-point correspondence between mechanisms and phenotypes. But this is all for a particular system, a particular mechanism, and a particular phenotype. When the mechanism involved has more sophisticated architecture or the phenotype has a large number of dimensions, this point-to-point mapping is insufficient.
Also, we often want to understand rules governing a class of phenotypes that exists in many systems. To understand this then requires establishing a set-to-set correspondence that a set of mechanisms are necessary and sufficient for a class of phenotypes. Typically, the sufficiency is easier, since it can be obtained from accumulation of point-to-point investigations, while necessity is much harder, since this constitute statements that for a class of phenotypes, only certain mechanistic features matter, while all other details can be ignored.
Classically, set-to-set maps start with sufficient conditions for phenotypes that are obtained by building models based on mechanisms. Mechanistic knowledge are generalizable in the sense that it applies when this component is used in arbitrary contexts at the timescale of concern, even ones not observed before. Therefore, mechanisms can be used to build models that sufficiently demonstrate certain phenotypes, connecting mechanisms to phenotypes in the forward direction. To do so, scientific rules about these mechanisms can be summarized by formalizing the core structures involved, often in a mathematical
language. For example, biological organisms are often made of interacting cells, force and mass are core structures of mechanical systems, and stoichiometry and rate laws are core structures of chemical reactions. These scientific rules then can be viewed as the waist of an hourglass that systematically convert knowledge about mechanisms on the top into models on the bottom. Then by simulation or analysis, these models take a mechanism and demonstrate sufficient conditions for phenotypes this mechanism exhibits.
This forward approach providing sufficient maps from mechanisms to phenotypes via mechanistic models was often considered the major success in classical scientific research.
This is because mechanistic knowledge is generalizable while phenotypic knowledge is less so. If they are accumulated at approximately the same rate, then mechanistic knowledge could yield more predictions and apply to more scenarios than just relying on phenotypes. However, the general difficulty of experimental investigations nowadays is the opposite: increasingly large data sets can be obtained for phenotypes by massively parallel methods, but knowledge about mechanisms accumulate slowly, forming a bottleneck.
This is especially prevalent in biology, where massive screening, various kinds of omics (transcriptomics, proteomics and metabolomics), and other sequencing and droplets based methods have become the powerhouse of scientific progress. This makes phenotype data accumulate several orders faster than mechanism data, which is based on physical and biochemical approaches that does not yet scale. As a result, there is a shift from placing higher weight on mapping mechanism data to phenotypes in classical times to placing higher weight on mapping phenotype data back to understand mechanisms in recent decades. Another major driving force to this change is engineering. The foundation for engineering is a set of alternatives, or possible designs, to achieve a given system behavior, or phenotype. The desire for a large design space therefore places significant priority on the necessary conditions that mechanisms need to satisfy to achieve a phenotype, which can be used to bound the design space. However, the reverse direction, mapping phenotypes back to mechanisms through necessary conditions, requires a system level understanding that is distinct from the core structures or scientific rules. In particular, since phenotypes are behaviors on the system level, this requires a systems theory that captures the core system level structures and formulate hard limits, or laws, on system performance independent of component level details.
A system theory can be considered as the waist of an hourglass, that connects systems’
phenotypes below to the hard limits and laws on top. One example is mechanical movements formulated as phenotypes of Hamiltonian or Lagrangian dynamical systems, deriving conservation of energy, mass, and momentum as hard limits or laws governing
these systems. Other examples are message transmission viewed as information channels with hard limits in terms of channel capacity, computation viewed as Turing machines with hard limits in terms of complexity and decidability, signal processing viewed as linear input output systems with hard reconstruction limits by Nyquist theorem, feedback control viewed as linear control systems with hard limits such as Bode’s conservation of robustness, and exchange between work and heat viewed as thermal engines with hard limits such as Carnot’s theorem and entropy maximization. These hard limits, or laws, obtained by the formulation of a systems theory, connect phenotypes or system behaviors with mechanisms in the reverse direction. For a given phenotype, the laws explain what properties of mechanisms are important, and what details do not matter. A phenotype can also be mapped back to mechanisms by laws to eliminate implausible mechanisms.
In light of the explosion of phenotype data or the demand of engineering, we need more hard limits and laws to do the reverse mapping of constraining mechanisms based on phenotypes. This work proposes the scientific rules, or core structures, of biomolecular systems as binding and catalysis, with catalysis determining the direction of change, and binding regulating catalysis rates. In this chapter we further formulate a systems theory for biomolecular systems. Since binding’s regulation of catalysis has reaction orders constrained in polyhedral sets, a biomolecular system can therefore be considered as a class of control systems, where the dynamics has fixed catalysis stoichiometry with controllers adjusting the exponents (or reaction orders) of the catalysis fluxes. This formulates a systems theory for biomolecular systems as flux exponent control (FEC). By formulating into a control system, FEC makes engineering hard limits or laws from control theory applicable to biomolecular dynamics including metabolism, and motivates discovery of further laws.
Let us now discuss this difficulty of knowing detailed mechanisms in the specific context of metabolic regulation. To illustrate concretely, let us take a simple enzymatic catalysis as an example:
𝐸+𝑆 ⇌𝐶−→𝑘 𝐸+𝑃.
Here the enzyme𝐸 catalyzes the conversion of substrate𝑆into product𝑃. This catalysis happens via an intermediate complex𝐶formed by the enzyme binding with the substrate, and this binding reaction happens fast, reaching binding equilibrium. Written as a metabolic reaction, this has the following form:
(𝑆tot, 𝑃tot)−−−→𝑣=𝑘𝐶 (𝑆tot−1, 𝑃tot+ 1),
so the change caused by this catalysis reaction is one less substrate, and one more product
molecule, while the rate or flux𝑣 of this reaction is𝑣 =𝑘𝐶, the catalysis rate constant𝑘 multiplying the concentration of intermediate complex𝐶.
Given this metabolic network in cells, experimentally we can observe the concentrations of substrate and product molecules at the end of some duration and see the decrease in substrate is the same as the increas in product, therefore deducing the stoichiometry of (−1,+1)for substrate and product molecules. However, the rate of this catalysis reaction over time, or how the rate depends on substrate and product concentrations, is still intractible to observe at scale in general. This is because observing this rate requires observing a time trajectory of substrate and product concentrations. The time trace of metabolites can be done for a selected few chemicals via isotope tracing or spectrometry, but time traces of many metabolites require chromotography and mass spectrometry at every time step, which becomes prohibitively expensive to do at scale. Furthermore, to have a model on regulation of reaction flux, i.e. how flux𝑣 depends on substrate and metabolite concentrations, we would require many time traces that cover a wide range of metabolite concentrations, which grows exponentially with the number of metabolites involved. Making the situation worse, the flux𝑣also depends on enzyme concentration 𝐸tot, while resolving enzyme concentrations over time requires separate experimental methods that are still hard to do jointly with metabolite concentration time traces at scale.
Together, these difficulties result in the sparsity of data for metabolic fluxes.
This sparsity then creates difficulty for the typical modeling approach where the mecha- nisms for catalysis are known or hypothesized and equations such as in Michaelis-Menten are derived for the rates. This is because for the mechanistic models to describe a system experimentally observed, we need to fit to data many mechanistic parameters that arise in the modeling process. But for a nontrivial metabolic network, the number of such parameters is too large to fit to a sparse set of data that is feasible to obtain experimentally.
This problem of under-determination also makes it hard for the mechanistic model to generalize to situations different from the ones in fitted data.
In order to resolve the under-determination problem due to sparse data, a constraint-based approach has been developed to model metabolism. The constraint-based approach dominated recent progress on computational models of large scale metabolic networks [85].
Constraint-based approach is a mechanistic modeling approach providing sufficient maps from mechanisms to phenotypes. But instead of relying on knowing all the mechanisms in a system to demonstrate a given phenotype, i.e. to build a model from a point in mechanism space to a point in phenotype space, it takes known mechanisms as constraints and unknown mechanisms as free to vary, and look at the set of all feasible phenotypes.
This is especially appropriate for metabolism, where data on mechanisms is almost always sparse. For metabolism, a natural split is between stoichiometry and flux regulations, since the former is relatively easier to know while mechanisms of the latter is much harder.
Therefore, constraint-based methods take the sparse mechanistic data such as stoichiometry and supply it as constraints on reaction fluxes. Then, either the set of all feasible fluxes can be analyzed, or optimization for certain objective functions such as growth maximization or ATP regeneration can be used to find specific points of interest in flux space (Figure4.2).
One such approach, dynamic flux balance analysis (dFBA) [60], has been very successful in modeling large scale metabolic networks. The recent work [66] illustrates that dFBA could capture complex behaviors of a microbial consortia such as hysteresis in response to environmental nutrient shifts, which is hypothesized to underlie the switching between beneficial and detrimental gut microbiome compositions.
Although it holds promise as a general model for metabolism, dFBA has severe limitations in applying to metabolism dynamics of interacting cells and populations. dFBA cannot model dynamics intrinsic to metabolic regulations. This is because dFBA assumes the intracellular metabolic fluxes are faster than external changes such as growth and nutrient shifts.
This makes in and out fluxes of metabolites always balanced and at steady state, which makes the constraint-based problem computationally solvable. But it also makes dFBA incapable of capturing potentially important transient dynamics intrinsic to metabolism, such as overshoots, undershoots, lags, and temporary arrests that can be essential for cell survival. In short, dFBA considers metabolic changes as static and instantaneous responses to slow variations in external environments, often on the time scale of hours to days. In comparison, many significant metabolic dynamics in cell physiology and the gut microbiome, for example, happens within minutes to hours [3,33]. In addition, as the strength of a constraint-based method comes from the set of constraints it could use, dFBA only incorporates the stoichiometry of metabolic reactions as a constraint, so it could be considered as overly unconstrained to include un-biological actions such as instantaneous changes of metabolic flux (see Figure4.2). This causes predictions of dFBA to be erroneous without time-consuming hand-tuning and curating by experts with extensive experimental data on the microbe modeled.
Another approach that tackles metabolic regulation was invented in [25], which uses glycolytic oscillation as an example to formulate the engineering hard limits or laws to map phenotypes back to mechanisms in the reverse, or necessary, direction. This approach builds on well-understood models of the glycolysis metabolic network from extensive experimental data, and asks what is unavoidable if the metabolic regulation is done by
arbitrary controllers, instead of the specific biological mechanisms. This could be done for glycolytic oscillation because it is at the opposite extreme of typical problems in metabolism:
a small network with extensive experimental data and well established mechanistic models.
Indeed, glycolytic oscillation, where metabolite concentrations (e.g. ATP and NADH) oscillate in the glycolysis pathway, has been widely observed and studied from yeast to human muscle since 1960s both theoretically and experimentally (e.g. [16, 49, 51]).
The feedbacks of autocatalysis and allosteric control of ATP on the PFK enzyme were thought necessary and sufficient for glycolytic oscillation, confirmed by mechanistic models, extensive simulations, and exhaustive experiments. So what was left to be understood was the deeper “why” questions and the full generality of this oscillation behavior. By showing that oscillatory behavior was unavoidable even if the metabolic regulation was performed by arbitrary controllers that maintain a steady flux, the paper [25] showed that oscillations are necessary side effects of robustness and efficiency tradeoffs. Specifically, it showed that by combining a law on conservation of robustness in control theory called Bode’s integral formula with the autocatalytic stoichiometry of glycolysis, a universal rule of metabolic regulation can be obtained that includes glycolytic oscillation as a subset:
Any regulatory circuit that must robustly maintain metabolite concentrations despite fluctuations in supply and demand will inevitably have significant oscillations in some conditions, and autocatalysis as well as efficiency aggravate this.
The approach in [25] based on laws from control theory is a significant success in deriving rules of metabolic regulation, and mapping phenotypes back to mechanisms in the necesssary direction. However, the particular problem formulation in [25] began with a mechanistic model, which required that the metabolic network is small and extensively studied by experiments and mechanistic modelling. This is rare for metabolic interactions of interest of microbial communities in human microbiomes or soil rhizospheres. In other words, the general systems theory that formulates metabolic regulation into a control system that allows control-theory analysis is not yet understood. In [25], the placement of the arbitrary controller at the allosteric coefficient is a careful choice motivated by domain knowledge about the glycolysis pathway and experimentally validated feedback mechanisms. The severe robustness-efficiency tradeoff will be ameliorated if the controller is placed at some other parameters, such as the reaction rate constants. Therefore, we would like to formulate a systems theory for metabolic networks and study rules of regulation for this system by converting it into a control system. This requires a fundamental understanding of what class of control systems describe the regulation of metabolic networks.