Chapter II: Polyhedral constraints enable holistic analysis of bioregulation
3.1 Introduction
Chapter 3
Polyhedral Representation of Binding Net- work Steady States
In earlier chapters, we see that binding reaction networks regulate catalysis reactions. In this chapter, we show that the regulatory profiles of a binding network can be characterized as constrained in polyhedral sets in terms of reaction orders (log derivatives). We investigate the mathematical properties of log derivatives from binding networks. In particular, we make the following contributions: (1) we define what the set of binding networks is that makes biological sense; (2) we characterize the manifold of all possible detailed balanced steady states of a binding network; (3) we derive a formula for log derivatives, which can be used for computational sampling; and (4) we show that the polyhedral shape of log derivatives fundamentally comes from decomposition rules of log derivative operators.
This further yields a calculus method to analytically obtain log derivative polyhedra, either top-down via dominance-decomposition tree (DDT) or bottom-up via summation of matrix representations.
πΆπΈπ βπΆπΈπ, where one form of molecules is transformed into another form. Here two binding reactions and one catalysis reaction describe this enzymatic reaction with product re-binding. Catalysis governs the direction of net change of the system, namely
(π‘π, π‘π)βπββββcatπΆπΈπβ(π‘πβ1, π‘π + 1), (3.2) the total amount of substrate molecule π‘π = π +πΆπΈπ is decreased by one, while the total amount of product molecule π‘π = π +πΆπΈπ is increased by one. Since the speed of the product formation (or the catalysis flux) is governed by the concentration ofπΆπΈπ, understanding the dynamics of biomolecular systems comes down to characterizing how the active complexπΆπΈπβs concentration is regulated by the total concentrations of enzymes π‘πΈ, substratesπ‘π, and productπ‘π.
This problem comes down to solving a system of polynomial equations, with the degree of the problem larger than the number of binding reactions in general. For example, the binding reactions in Eq (3.1) yield the following system of equations at steady state:
πΆπΈππΎπΈπ =πΈπ, πΆπΈππΎπΈπ =πΈπ, π‘πΈ =πΈ+πΆπΈπ+πΆπΈπ, π‘π =π+πΆπΈπ, π‘π =π +πΆπΈπ, (3.3) whereπΎπΈπ is dissociation constant for the binding ofπΈandπ, andπΎπΈπ is that forπΈ and π. Solving forπΆπΈπ in terms ofπ‘πΈ, π‘π, π‘π, πΎπΈπ, πΎπΈπ, for example, comes down to solving the following polynomial equation of degree 3:
πΆ3(πΎβπΎβ²) +πΆ2(πΎ(πΎβ²+π‘πβπ‘π) +π‘πΈ(πΎβ²βπΎ) + 2πΎβ²π‘πβπΎ2)βπΆπ‘π(πΎ(πΎβ²+π‘π)βπ‘πΈ(πΎβ2πΎβ²) +πΎβ²π‘π) +π‘πΈπΎβ²π‘2π= 0.
Here to make the equation not overly complicated, we used shorthandπΆandπΎforπΆπΈπand πΎπΈπ, andπΆβ² andπΎβ² forπΆπΈπ andπΎπΈπ. As the AbelβRuffini theorem states that polynomial equations of degree more than 5 do not have explicit solution in terms of elementary functions, active complex concentrations are not analytically solvable in general with four or more binding reactions. In fact, even for two binding reactions, the analytical formula is complicated enough that analytical insights are hard to obtain. More importantly, although systems of polynomial equations can be numerically solved to an extent, this is computationally intractable for large systems in general (it is well known to be NP hard), and relaxations such as sum-of-squares [86] or signomials [78] are needed for even moderate-size problems.
Existing approximations are limited in applicable scenarios. Tranditionally, based on the application scenario of interest, approximations are made to trade exactness for tractability.
One such example is the Michaelis-Menten (MM) formula, which was developed by Michaelis-Menten [63] and rigorized by Briggs-Haldane [23], and has served as the
foundation of dynamic modeling of biochemical reactions for the past 100 years [31,57, 65]. Focusing on the case of enzymatic catalysis where substrates are small molecules, the
MM formula assumed the substrate concentration is kept much higher than that of the enzyme. Assumptions like this allows simple analytical solutions to the enzymatic catalysis problems. As a result of its powerful simplicity, the MM formula has been fruitfully applied to many biomolecular scenarios, such as bulk enzymatic catalysis, single molecule catalysis, transcription-translation, and chemotaxis phosphorylation [116]. However, as the scientific and engineering study of biomolecular systems ventures forward into more complex and dynamic systems, such as in developmental biology and in post-translational regulations, assumptions of the MM type no longer hold. Instead, full regulatory profile without any assumptions are now needed to understand system behaviors. For example, when the concentrations of chemicals change significantly over time, such as in metabolic shifts and gene regulations, especiallyin vivo, the MM assumption breaks down [2,28,108]. Other recent examples are combinatorial regulations in gene circuits that result in promiscuous sensing and multistable cell fate regulations [9,46,122].
While MM and related approximations come from the bulk assumption of one speciesβ
concentration is much higher than another, another wide class of approximations in biophysics come from microscopic assumptions where the system of study is just one molecule in a bath of other molecules, e.g. one receptor in a bath of ligands [35, 88].
Formula taking the form of rational functions can be analytically obtained for such cases from arguments of thermodynamics [35], statistical mechanics [88,89], or Markov chain theory [53, 76]. However, when applying results from this analysis to systems with more than one molecules, an implicit assumption of mean-field flavor is made that many moleculesβ behavior are independent and identical, and therefore approximated as many copies of the same one-molecule system. This is known to cause crucial deviations from experiments in synthetic and systems biology. One term in bioengineering used to describe such phenomenon is retroactivity [32], where transcription factors bound to promoters of genes on plasmids reduce the free transcription factors in solution, so that although the gene regulated is downstream, it βretroactivelyβ acts on its upstream transcription factor.
This is an example where the activity of these genes cannot be considered as independent and identical copies of single plasmids, as whether this plasmid will have transcription factors bound depends on whether other plasmids have significantly βabsorbedβ away transcription factors in solution.
Yet another approach to simplify is to assume that the scenario of interest is similar to an experimental setting where we can control the non-total concentrations directly. This
often holds for βinduction curveβ experiments,in vitroorin vivo, where concentrations of small molecules are controlled by a chemical bath, and equilibrium is effectively reached in experiment if the small molecules can freely exchange between solution bath and the system of interest. Hence for the purpose of quantitatively modelling the induction curve obtained, the control variable is the free molecule concentration, instead of total concentration. Each substitution of free concentration as control variable instead of total would simplify the polynomial to be solved by one degree. So if each binding reaction has one species controlled like this, then this reduces to explicit solutions taking rational function forms just like in single-molecule case. For the system in Eq (3.1), if both free substrate concentrationπ and free product concentration π are controlled via external baths, then the binding reactions are effectively state transitions, which is amenable to a single-molecule or Markov chain interpretation, as is often done in biophysics:
πΈ π
+ πΈππ
ββββ
β½βββ
πβπΈπ
πΆπΈπ βββπcat πΆπΈπ π
+
βββπΈπβ
β½βββ
ππΈπβ π
πΈ.
Note that the MM assumption that substrate and product concentrations are much more than the enzymeπ‘π, π‘π β«π‘πΈ produce the same approximation, since the free and the total are approximately the same thatπ‘π βπandπ‘π βπ in this case. We again see the limitation of this simplifying approach. On one hand, it is only applicable to experimental scenarios where systemβs internal concentrations are in equilibrium with external bath. On the other hand, the simplification has limited effect for complex systems with significant internal dynamics not accessible to external control. For the binding system in Eq (3.1), if only the free substrateπis externally controlled while productπ is not, then to explain the induction curve from experiments, we want active complex πΆπΈπ in terms of (π, π‘πΈ, π‘π, πΎπΈπ, πΎπΈπ), yielding the following polynomial.
πΆπΈπ2 πΎπΈπ(πΎπΈπ +π) +πΆπΈππ((πΎπΈπ +π‘π)πΎπΈπ βπ‘πΈπΎπΈπ +πΎπΈππ)βπ‘πΈπΎπΈππ2 = 0.
This is one degree less than the π‘π case, but still not degree one in πΆπΈπ, therefore not amenable to rational function solution. If there are more internal binding reactions that cannot be externally controlled, then the problem is again increasing in degree and becomes intractable analytically or computationally.
To get a sense for the magnitude of the error made, we consider just one binding reaction πΊ+π π
ββ+
β½β
πβ
πΊπ , whereπΊis the concentration or copy number of the gene of interest, and π is that of a regulator such as a transcription factor, and πΊπ is the complex formed when the gene and the regulator are bound. For this simple system, the same solution πΊπ βπ‘πΊ π‘π
πΎ+π‘π
withπΎ = ππβ+, is obtained from the MM approximation that total regulator
is much higher than gene π‘π β« π‘πΊ, the single molecule states approximation for gene moleculesπΊ, and the external bath approximation for a bath of free regulator concentration π . The explicit solution from solving the quadratic equations from the binding reaction is 2πΊπ =π‘πΊ+π‘π +πΎββοΈ(π‘πΊ+π‘π +πΎ)2 β4π‘πΊπ‘π . We can holdπ‘πΊconstant and varyπ‘π to see big an error does the approximation make compared to the exact solution. See Figure3.1.
Whenever the total regulator concentrationπ‘π gets close to the total gene concentration π‘πΊ, we see the exact bound fraction of gene is much less than predicted from approximate solutions.
Figure 3.1Comparison of approximate solution (Approx) to exact solution for a simple binding reaction π +πΊβπΊπ , when different total gene concentrationπ‘πΊis held fixed. The units of concentrations areπΎhere.
To summarize, MM approximations, single molecule states approximations, and external- bath approximations produce similar simplifications that yield rational-function solutions in ideal cases. There are many scenarios that these simplifications apply, yielding fruitful biological insights. There are also scenarios that go beyond these approximations, such as combinatorial regulations and highly dynamic shifts. Therefore, we would like a method of analysis that can tackle the full regulatory behavior for general scenarios without approximations, and at the same time reduce to simpler cases above when it is reasonable to do so.
As directly solving for the catalysis rate or the active complex concentration is not tractable, we need to find other variables to capture the full regulatory profile. In this work, we focus on the reaction orders, i.e. the order of ratesβ dependence on total reactant concentrations.
We show that the full regulatory profile of catalysis rates can be characterized in terms of polyhedral sets that bound log derivatives, continuous analogues of reaction orders. Since knowing exact log derivatives implies knowing the catalysis rates up to a multiplicative constant, we therefore have a way to capture the full regulatory behavior of rates by giving up the information about exact magnitude, which can often be estimated or measured
experimentally.
In Section 3.3 we formally define binding reaction networks using chemical reaction network theory, and characterize a class of binding networks that are biologically plausible.
In Section 3.4 we characterize the manifold of equilibrium or detailed balance steady states of binding networks, and introduce log derivatives as a transform between different parameterizations of the manifold. In Section3.6, we focus on a binding network with just one binding reaction and fully analyze the reaction orders (one type of log derivatives) and their biological implications. We observe the polyhedral set bounding the full range of values the reaction orders can take. In Section 3.7, given the central importance of vertices of reaction order polyhedra, we characterize the vertices in terms of minimal support vectors of linear subspaces and develop a computational method to obtain them at scale. In Section3.8, we show that polyhedra arise naturally from decomposition of log derivative operators. Using this, we develop an approach to obtain reaction order polyhedra analytically.