Introduction - Polyhedral constraints enable holistic analysis of bioregulation

Chapter II: Polyhedral constraints enable holistic analysis of bioregulation

3.1 Introduction

Chapter 3 Polyhedral Representation of Binding Net- work Steady States

In earlier chapters, we see that binding reaction networks regulate catalysis reactions. In this chapter, we show that the regulatory profiles of a binding network can be characterized as constrained in polyhedral sets in terms of reaction orders (log derivatives). We investigate the mathematical properties of log derivatives from binding networks. In particular, we make the following contributions: (1) we define what the set of binding networks is that makes biological sense; (2) we characterize the manifold of all possible detailed balanced steady states of a binding network; (3) we derive a formula for log derivatives, which can be used for computational sampling; and (4) we show that the polyhedral shape of log derivatives fundamentally comes from decomposition rules of log derivative operators.

This further yields a calculus method to analytically obtain log derivative polyhedra, either top-down via dominance-decomposition tree (DDT) or bottom-up via summation of matrix representations.

𝐶_𝐸𝑆 →𝐶_𝐸𝑃, where one form of molecules is transformed into another form. Here two binding reactions and one catalysis reaction describe this enzymatic reaction with product re-binding. Catalysis governs the direction of net change of the system, namely

(𝑡_𝑆, 𝑡_𝑃)−^𝑘−−−−^cat^𝐶^𝐸𝑆→(𝑡_𝑆−1, 𝑡_𝑃 + 1), (3.2) the total amount of substrate molecule 𝑡_𝑆 = 𝑆 +𝐶_𝐸𝑆 is decreased by one, while the total amount of product molecule 𝑡_𝑃 = 𝑃 +𝐶_𝐸𝑃 is increased by one. Since the speed of the product formation (or the catalysis flux) is governed by the concentration of𝐶_𝐸𝑆, understanding the dynamics of biomolecular systems comes down to characterizing how the active complex𝐶_𝐸𝑆’s concentration is regulated by the total concentrations of enzymes 𝑡_𝐸, substrates𝑡_𝑆, and product𝑡_𝑃.

This problem comes down to solving a system of polynomial equations, with the degree of the problem larger than the number of binding reactions in general. For example, the binding reactions in Eq (3.1) yield the following system of equations at steady state:

𝐶_𝐸𝑆𝐾_𝐸𝑆 =𝐸𝑆, 𝐶_𝐸𝑃𝐾_𝐸𝑃 =𝐸𝑃, 𝑡_𝐸 =𝐸+𝐶_𝐸𝑆+𝐶_𝐸𝑃, 𝑡_𝑆 =𝑆+𝐶_𝐸𝑆, 𝑡_𝑃 =𝑃 +𝐶_𝐸𝑃, (3.3) where𝐾_𝐸𝑆 is dissociation constant for the binding of𝐸and𝑆, and𝐾_𝐸𝑃 is that for𝐸 and 𝑃. Solving for𝐶_𝐸𝑆 in terms of𝑡_𝐸, 𝑡_𝑆, 𝑡_𝑃, 𝐾_𝐸𝑆, 𝐾_𝐸𝑃, for example, comes down to solving the following polynomial equation of degree 3:

𝐶³(𝐾−𝐾^′) +𝐶²(𝐾(𝐾^′+𝑡_𝑃−𝑡_𝑆) +𝑡_𝐸(𝐾^′−𝐾) + 2𝐾^′𝑡_𝑆−𝐾²)−𝐶𝑡_𝑆(𝐾(𝐾^′+𝑡_𝑃)−𝑡_𝐸(𝐾−2𝐾^′) +𝐾^′𝑡_𝑆) +𝑡_𝐸𝐾^′𝑡²_𝑆= 0.

Here to make the equation not overly complicated, we used shorthand𝐶and𝐾for𝐶_𝐸𝑆and 𝐾_𝐸𝑆, and𝐶^′ and𝐾^′ for𝐶_𝐸𝑃 and𝐾_𝐸𝑃. As the Abel–Ruffini theorem states that polynomial equations of degree more than 5 do not have explicit solution in terms of elementary functions, active complex concentrations are not analytically solvable in general with four or more binding reactions. In fact, even for two binding reactions, the analytical formula is complicated enough that analytical insights are hard to obtain. More importantly, although systems of polynomial equations can be numerically solved to an extent, this is computationally intractable for large systems in general (it is well known to be NP hard), and relaxations such as sum-of-squares [86] or signomials [78] are needed for even moderate-size problems.

Existing approximations are limited in applicable scenarios. Tranditionally, based on the application scenario of interest, approximations are made to trade exactness for tractability.

One such example is the Michaelis-Menten (MM) formula, which was developed by Michaelis-Menten [63] and rigorized by Briggs-Haldane [23], and has served as the

foundation of dynamic modeling of biochemical reactions for the past 100 years [31,57, 65]. Focusing on the case of enzymatic catalysis where substrates are small molecules, the

MM formula assumed the substrate concentration is kept much higher than that of the enzyme. Assumptions like this allows simple analytical solutions to the enzymatic catalysis problems. As a result of its powerful simplicity, the MM formula has been fruitfully applied to many biomolecular scenarios, such as bulk enzymatic catalysis, single molecule catalysis, transcription-translation, and chemotaxis phosphorylation [116]. However, as the scientific and engineering study of biomolecular systems ventures forward into more complex and dynamic systems, such as in developmental biology and in post-translational regulations, assumptions of the MM type no longer hold. Instead, full regulatory profile without any assumptions are now needed to understand system behaviors. For example, when the concentrations of chemicals change significantly over time, such as in metabolic shifts and gene regulations, especiallyin vivo, the MM assumption breaks down [2,28,108]. Other recent examples are combinatorial regulations in gene circuits that result in promiscuous sensing and multistable cell fate regulations [9,46,122].

While MM and related approximations come from the bulk assumption of one species’

concentration is much higher than another, another wide class of approximations in biophysics come from microscopic assumptions where the system of study is just one molecule in a bath of other molecules, e.g. one receptor in a bath of ligands [35, 88].

Formula taking the form of rational functions can be analytically obtained for such cases from arguments of thermodynamics [35], statistical mechanics [88,89], or Markov chain theory [53, 76]. However, when applying results from this analysis to systems with more than one molecules, an implicit assumption of mean-field flavor is made that many molecules’ behavior are independent and identical, and therefore approximated as many copies of the same one-molecule system. This is known to cause crucial deviations from experiments in synthetic and systems biology. One term in bioengineering used to describe such phenomenon is retroactivity [32], where transcription factors bound to promoters of genes on plasmids reduce the free transcription factors in solution, so that although the gene regulated is downstream, it “retroactively” acts on its upstream transcription factor.

This is an example where the activity of these genes cannot be considered as independent and identical copies of single plasmids, as whether this plasmid will have transcription factors bound depends on whether other plasmids have significantly “absorbed” away transcription factors in solution.

Yet another approach to simplify is to assume that the scenario of interest is similar to an experimental setting where we can control the non-total concentrations directly. This

often holds for “induction curve” experiments,in vitroorin vivo, where concentrations of small molecules are controlled by a chemical bath, and equilibrium is effectively reached in experiment if the small molecules can freely exchange between solution bath and the system of interest. Hence for the purpose of quantitatively modelling the induction curve obtained, the control variable is the free molecule concentration, instead of total concentration. Each substitution of free concentration as control variable instead of total would simplify the polynomial to be solved by one degree. So if each binding reaction has one species controlled like this, then this reduces to explicit solutions taking rational function forms just like in single-molecule case. For the system in Eq (3.1), if both free substrate concentration𝑆 and free product concentration 𝑃 are controlled via external baths, then the binding reactions are effectively state transitions, which is amenable to a single-molecule or Markov chain interpretation, as is often done in biophysics:

𝐸 ^𝑘

+ 𝐸𝑆𝑆

−−−⇀

↽−−−

𝑘⁻_𝐸𝑆

𝐶_𝐸𝑆 −−→^𝑘^cat 𝐶_𝐸𝑃 ^𝑘

−−−𝐸𝑃⇀

↽−−−

𝑘_𝐸𝑃⁻ 𝑃

𝐸.

Note that the MM assumption that substrate and product concentrations are much more than the enzyme𝑡_𝑆, 𝑡_𝑃 ≫𝑡_𝐸 produce the same approximation, since the free and the total are approximately the same that𝑡_𝑆 ≈𝑆and𝑡_𝑃 ≈𝑃 in this case. We again see the limitation of this simplifying approach. On one hand, it is only applicable to experimental scenarios where system’s internal concentrations are in equilibrium with external bath. On the other hand, the simplification has limited effect for complex systems with significant internal dynamics not accessible to external control. For the binding system in Eq (3.1), if only the free substrate𝑆is externally controlled while product𝑃 is not, then to explain the induction curve from experiments, we want active complex 𝐶_𝐸𝑆 in terms of (𝑆, 𝑡_𝐸, 𝑡_𝑃, 𝐾_𝐸𝑆, 𝐾_𝐸𝑃), yielding the following polynomial.

𝐶_𝐸𝑆² 𝐾𝐸𝑆(𝐾_𝐸𝑆 +𝑆) +𝐶𝐸𝑆𝑆((𝐾_𝐸𝑃 +𝑡𝑃)𝐾_𝐸𝑆 −𝑡𝐸𝐾𝐸𝑆 +𝐾𝐸𝑃𝑆)−𝑡𝐸𝐾𝐸𝑃𝑆² = 0.

This is one degree less than the 𝑡_𝑆 case, but still not degree one in 𝐶_𝐸𝑆, therefore not amenable to rational function solution. If there are more internal binding reactions that cannot be externally controlled, then the problem is again increasing in degree and becomes intractable analytically or computationally.

To get a sense for the magnitude of the error made, we consider just one binding reaction 𝐺+𝑅 ^𝑘

−⇀+

↽−

𝑘⁻

𝐺_𝑅, where𝐺is the concentration or copy number of the gene of interest, and 𝑅 is that of a regulator such as a transcription factor, and 𝐺_𝑅 is the complex formed when the gene and the regulator are bound. For this simple system, the same solution 𝐺𝑅≈𝑡𝐺 𝑡𝑅

𝐾+𝑡𝑅

with𝐾 = ^𝑘_𝑘⁻₊, is obtained from the MM approximation that total regulator

is much higher than gene 𝑡_𝑅 ≫ 𝑡_𝐺, the single molecule states approximation for gene molecules𝐺, and the external bath approximation for a bath of free regulator concentration 𝑅. The explicit solution from solving the quadratic equations from the binding reaction is 2𝐺_𝑅 =𝑡_𝐺+𝑡_𝑅+𝐾−^√︁(𝑡_𝐺+𝑡_𝑅+𝐾)² −4𝑡_𝐺𝑡_𝑅. We can hold𝑡_𝐺constant and vary𝑡_𝑅to see big an error does the approximation make compared to the exact solution. See Figure3.1.

Whenever the total regulator concentration𝑡_𝑅 gets close to the total gene concentration 𝑡_𝐺, we see the exact bound fraction of gene is much less than predicted from approximate solutions.

Figure 3.1Comparison of approximate solution (Approx) to exact solution for a simple binding reaction 𝑅+𝐺⇌𝐺𝑅, when different total gene concentration𝑡𝐺is held fixed. The units of concentrations are𝐾here.

To summarize, MM approximations, single molecule states approximations, and external- bath approximations produce similar simplifications that yield rational-function solutions in ideal cases. There are many scenarios that these simplifications apply, yielding fruitful biological insights. There are also scenarios that go beyond these approximations, such as combinatorial regulations and highly dynamic shifts. Therefore, we would like a method of analysis that can tackle the full regulatory behavior for general scenarios without approximations, and at the same time reduce to simpler cases above when it is reasonable to do so.

As directly solving for the catalysis rate or the active complex concentration is not tractable, we need to find other variables to capture the full regulatory profile. In this work, we focus on the reaction orders, i.e. the order of rates’ dependence on total reactant concentrations.

We show that the full regulatory profile of catalysis rates can be characterized in terms of polyhedral sets that bound log derivatives, continuous analogues of reaction orders. Since knowing exact log derivatives implies knowing the catalysis rates up to a multiplicative constant, we therefore have a way to capture the full regulatory behavior of rates by giving up the information about exact magnitude, which can often be estimated or measured

experimentally.

In Section 3.3 we formally define binding reaction networks using chemical reaction network theory, and characterize a class of binding networks that are biologically plausible.

In Section 3.4 we characterize the manifold of equilibrium or detailed balance steady states of binding networks, and introduce log derivatives as a transform between different parameterizations of the manifold. In Section3.6, we focus on a binding network with just one binding reaction and fully analyze the reaction orders (one type of log derivatives) and their biological implications. We observe the polyhedral set bounding the full range of values the reaction orders can take. In Section 3.7, given the central importance of vertices of reaction order polyhedra, we characterize the vertices in terms of minimal support vectors of linear subspaces and develop a computational method to obtain them at scale. In Section3.8, we show that polyhedra arise naturally from decomposition of log derivative operators. Using this, we develop an approach to obtain reaction order polyhedra analytically.

Dalam dokumen Biocontrol of biomolecular systems (Halaman 90-95)