Chapter II: Polyhedral constraints enable holistic analysis of bioregulation
2.2 Reaction order polyhedra can be derived and computed at scale
We proposed to describe bioregulation through reaction orders because the full bioregu- lation profile in terms of catalysis rates or fluxesβ dependence on concentrations require solving high degree polynomial equations, which is intractable analytically and computa- tionally. While previously we illustrated how reaction orders capture bindingβs regulation of catalysis through a simple binding network where rates can be explicitly solved, in this section we demonstrate that in contrast to rates and fluxes, reaction orders can be computed and derived at scale. We show this by first demonstrating a computational sampling algorithm to obtain reaction order polyhedra through matrix algebra. This is based on a reaction order formula for arbitrary binding networks, generalizing the procedure we used to calculate reaction orders in the simple binding network using implicit function theorem.
Then we show that the reaction order polyhedra themselves can also be derived directly through a method called dominance decomposition tree (DDT), based on rules of calculus for positive variables.
Computational sampling of reaction order polyhedra. The computational procedure to sample points of reaction order polyhedra is shown in Figure 2.3. Here we choose the binding network of an induced activator as an illustration. Here πΊis a gene to be expressed,π is an activating transcriptional regulator, andπis an inducer of the regulator.
The two binding reactions are the inducerπ binds with the activatorπ to form a complex πΆπ π, and this induced activator binding with the geneπΊto form a complexπΆπΊπ π. πΆπΊπ π
is then the activated gene complex capable of gene expression. So a natural objective for analysis is to understand how the gene expression is regulated by this binding network. We characterize the full bioregulation profile of this binding network by obtaining the reaction order polyhedra, without making any assumption about gene copy number, activator amount, or inducer concentration, or their binding strengths.
To obtain the reaction order polyhedra, one way is to computationally sample points from it.
This will visually show the polyhedron because the vertices and edges are robust, therefore naturally have higher density of points. The computational sampling is based on a formula for reaction orders that hold for arbitrary biological binding networks, shown in step 3 of Figure2.3. This formula is derived using implicit function theorem, through a similar process as in the reaction order calculation of the simple binding network (see Chapter3 for details).
Figure 2.3The procedure to computationally sample reaction order polyhedra of a binding network, illustrated with the binding network of an induced activator. See Chapter3for detailed derivations. Step 1, specify a binding network. Step 2, write down the stoichiometry matrix for the binding reactions. For each binding reaction, use the stoichiometry in the binding direction. For the order of the species, put the free form of the species first. These species are called atomic species, with the special property that conserved quantities represent totals of these species. Step 3, compute the conservation law matrixπΏfrom the stoichiometry matrixπ. Step 4, compute reaction orders using the formula. HereΞπ‘denote a diagonal matrix with the vector of totalsπ‘on the diagonal. Same forΞπ₯. Step 5, for a target species whose reaction order is to be studied, we can sample points of its reaction order polyhedron by taking random values ofπ₯, calculateπ‘and pass through the formula to compute the reaction order, and then plot the points. The figure listed here shows the reaction order polytope of the induced activator, rotated in four different angles to show the 3D shape. The arrow around a line denotes in which direction the 3D shape is rotated from the upper left figure.
Sampling of theπΆπΊπ πreaction order polytops is done by log-uniformly sampling the values of each variable in(πΊ, π , π, πΆπ π, πΆπΊπ π)between10β6and106with 100000 points in total.
To use the formula, we see it requires the input of the stoichiometry matrixπ of dimension πΓπand the conservation law matrixπΏof dimensionπΓπ, whereπis the number of species involved,πis the number of binding reactions with linearly independent stoichiometry, andπis the number of conserved quantities or the number of totals. The stoichiometry matrix can be directly obtained from the binding network once it is specified, as shown in Step 2 of Figure2.3. It might occur that there are binding reactions with linearly dependent stoichiometry vector. In that case, compute the rankπof the stoichiometry matrix, and selectπof the linearly independent reactions to form the stoichiometry matrix π. The result is independent of whichπlinearly independent reactions are selected. Since binding
reactions are reversible, as a convention, we choose the stoichiometry vector of the binding direction to form matrix π. When possible, it is helpful if a subset ofπ of the species, considered atomic species, can be distinguished and arranged first in the ordering of the species. Atomic species often correspond to the free form of the molecules, so that their total amount is conserved through the binding reactions.
The conservation law matrix πΏ, which defines the totals, can be computed from the stoichiometry matrix π (see Step 3 of Figure 2.3 for illustration, and Chapter 3 for derivation details). If the atomic species are ordered first, then this computation is simple.
We can split πΏ into two submatrices: πΏ = [οΈIπ πΏ2]οΈ, with the first submatrix a πΓ π identity matrix. Similarly splitπ =[οΈπ1 π2]οΈ, then the submatrixπΏ2can be computed as πΏ2 = (βπ2β1π1)βΊ. OnceπΏis obtained, the totalπ‘of dimensionπcan be defined asπ‘ =πΏπ₯, whereπ₯, a vector of dimensionπ, is the concentrations of all the species involved.
With both the stoichiometry matrixπ and the conservation law matrixπΏwritten down, we can apply the reaction order formula in Step 4 of Figure2.3. This formula can calculate the reaction orders for any given concentration vector of the speciesπ₯. This allows us to sample points in the reaction order polyhedra for any species of interest. In Step 5 of Figure2.3, the sampling result for reaction order polyhedron ofπΆπΊπ π to the total geneπ‘πΊ, total regulator π‘π and total substrateπ‘πis shown. We see a polytope with vertices(1,0,0),(0,1,0),(0,0,1), (1,1,0),(1,0,1), and(1,1,1), where the order is πΆπΊπ πβs reaction order in (π‘πΊ, π‘π , π‘π). In particular, we see the triangle from the simple binding network is contained as a facet in this polytope. Indeed if inducer concentration is overabundant such that theπ is always induced, then the induced activator binding network reduces toπΊ+π * βπΆπΊπ *, where π *denote the always induced regulator. This is exactly the same as the simple binding network. Similarly, if the regulatorπ is overabundant so that the inducer is always bound to regulators, or if the gene is overabundant so that any induced activator is always bound to genes, then we also have reduction to simple binding. This explains the three triangular facets of the polytope. This is another illustration of the geometric hierarchy of reaction order polyhedra, and its correspondence with asymptotic conditions on concentrations.
Lastly, we note that the computation of reaction orders only involves matrix algebra. The most costly step is inverting the matrix. This is much less costly compared to solving polynomial equations. Specifically, inverting a matrix of dimensionπhas computational complexity π(π2.3) to π(π3), so the cost of computationally sample the polyhedron is π(π π2.3)whereπ is the number of points in the polyhedra to be sampled. In contrast, a rough complexity for solving systems of polynomial equations is degree to the number of equations [62]. Assuming the number of (linearly independent) binding reactions
π is proportional to the number of species π, we have computational cost π(π2π) for numerically scanning the full regulatory profile by solving the polynomial equations of binding network steady states. This is indeed much higher cost than sampling reaction order polyhedra. Furthermore, the matrices involved are very sparse for large binding networks, which can further speed up computations.
Dominance decomposition tree (DDT) can derive reaction order polyhedra directly.
While the computational sampling algorithm is powerful in obtaining numerical values of reaction orders for large binding networks, to obtain the geometric shape of the reaction order polyhedra requires further analysis on top of the sampled result, such as visual inspection. Here, we describe an analytical procedure that directly obtains the reaction order polyhedra called dominance decomposition tree (DDT). This procedure also provides deeper insight into why the polyhedral shape arise in the first place. See Chapter3for details on the relevant concepts and derivations.
In order to obtain the reaction order polyhedra directly, we need to look into why reaction orders form polyhedral sets in the first place. In Chapter3, we show that convex combinations are part of the fundamental rules of calculus for positive variables. To get a sense of this, we make the following observation, that log derivatives of sums of functions give rise to convex combinations. Considerπ(π₯) =π1(π₯) +π2(π₯), all are positive functions andπ₯is a positive variable. Then
πlogπ(π₯)
πlogπ₯ = π1
π1+π2
πlogπ1(π₯)
πlogπ₯ + π2
π1+π2
πlogπ1(π₯)
πlogπ₯ .
We see that for reaction orders, or log derivatives, when terms are summed together, the sumβs reaction order is the convex combination of each termβs reaction order. In other words, terms compete for dominance in their orders. Ifππ is large, then the convex coefficient ππ
π is closer to1, so the order ofππ dominates in πβs order. As an application of this, the log derivative of polynomials, or ratios of polynomials, form a polytope from convex combination of each termβs exponents. For example, π(π₯) = π π0π₯π0
1π₯π1+π2π₯π2 has log derivativeπ0β(π1π1 +π2π2), whereππare convex coefficients defined byπ1 = π π1π₯π1
1π₯π1+π2π₯π2
andπ2 = π π2π₯π2
1π₯π1+π2π₯π2. Translated into the reaction order context, this means when multiple processes contribute to a flux, the reaction order of the flux is a convex combination of each processβs reaction order. Therefore, the flux reaction order takes value in the polytope with each processβs reaction order as vertices. The flux reaction order is close to theπth vertex when theπth process is dominant over the other processes.
The reader may have noticed that the rule of competition for order dominance, or convex combinations from sums, is discussed for sums in the quantity to be differentiated. However,
for reaction orders, the sum if in the coordinate variables. For example, in the simple binding network, the reaction order ofπΆto total substrate is πlogπΆ
πlogπ‘π = πlog(π+πΆ)πlogπΆ . So this is the situation of πlogπ
πlog(π₯1+π₯2), rather than πlog(π1+π2)
πlogπ₯ that we discussed above. Without going into details, here we simply state that this rule of competition for order dominance propagates to sums in the coordinate variables as well. To be specific, we have
πlogπ
πlogπ₯1+π₯2 =πΌ1 πlogπ
πlogπ₯1 +πΌ2 πlogπ
πlogπ₯2
for some convex coefficientsπΌ1 andπΌ2 withπΌ1, πΌ2 β₯0,πΌ1+πΌ2 = 1, andπΌ1 becomes close to1whenπ₯1 is dominant in the sumπ₯1+π₯2. Here in order for the log derivatives to make sense, the positive variablesπ₯1 andπ₯2 are related to each other so that there is only one degree of freedom, e.g. π₯1 +π₯2 = 1 orπ₯1π₯2 = 1, and the functionπ is a function of this one degree of freedom. This internal relation between the coordinate variables is indeed the case for binding network. For example, the simple binding network has (πΈ, π, πΆ) constrained by the steady state equation to have2degrees of freedom, rather than3. This rule of competition for dominance is the underlying reason for the polyhedral set of reaction orders. More importantly, this rule can be applied as a calculation procedure to directly obtain the reaction order polyhedra. For example, competition for dominance applied to reaction order ofπΆ to(π‘π, π‘πΈ)in the simple binding network yields the triangle straight-forwardly. WhenπΆis dominant inπ‘π =π+πΆ, thenπ‘π βπΆ, so the reaction order is approximately πlogπΆ
πlog(π‘π,π‘πΈ) β πlog(πΆ,π‘πlogπΆ
πΈ). SinceπΆappears as a coordinate variable, of course the fold change ofπΆis exactly the same asπΆ itself, independent ofπ‘πΈ whenπΆis held constant.
So the reaction order in this dominance condition is πlogπΆ
πlog(π‘π,π‘πΈ) β πlog(πΆ,π‘πlogπΆ
πΈ) = [οΈ1 0]οΈ. The left-over case is whenπis dominant inπ‘π, so πlogπΆ
πlog(π‘π,π‘πΈ) β πlog(π,π‘πlogπΆ
πΈ). Now we can consider the next dominance condition ofπ‘πΈ. IfπΆis dominant inπ‘πΈ, then πlogπΆ
πlog(π‘π,π‘πΈ) β ππlog(π,πΆ)logπΆ , so againπΆonly varies with theπ‘πΈ coordinate, resulting in a simple reaction order
[οΈ0 1]οΈ. Then the left over case isπΈis dominant inπ‘πΈ, which gives πlogπΆ
πlog(π‘π,π‘πΈ) β ππlog(π,πΈ)logπΆ . Now we recall the steady state expression that πΆ =πΎβ1ππΈ. So the reaction order for this dominance condition is
[οΈ
1 1]οΈ.
From this simple binding network example, we see that applying the rule of competition for dominance to each coordinate, and ask for which term is dominant, can obtain the vertices of the reaction order polyhedron. Then taking convex combination of the vertices, we obtain the polyhedron itself. This decomposing each coordinate into different dominance conditions reminds us of how the geometric hierarchy of the reaction order polyhedron corresponds to asymptotic conditions on concentrations. Indeed, this competition for
Figure 2.4Illustration of the dominance decomposition tree (DDT) procedure to obtain reaction order polyhedra directly. The example binding network of an induced activator is used. (1) The binding network for the induced activator. (2) The definitions of the totals in this binding network. (3) The minimal expressions of πΆπΊπ πin terms of other species through steady state relations. (4) The DDT procedure written out step-by-step for the reaction order ofπΆπΊπ πin(π‘πΊ, π‘π , π‘π). The forked lines denote a decomposition step. The variables in rectangles are coordinates with respect to which the log derivative is to be calculated. The downward black arrow means evaluation of log derivative. After decomposition has finished, all the resulting reaction orders are taken convex combination together to obtain the reaction order polyhedra.
dominance is the deeper reason for why this geometry-to-concentration correspondence can exist.
Putting together the coordinate decomposition steps using the rule of competition for dominance, we obtain the procedure called dominance decomposition tree (DDT) that can directly obtain the reaction order polyhedra. In Figure2.4, we write out the steps of DDT applied to a more complex example: the binding network of induced activators, whose reaction order polyhedron is also computationally sampled in Figure 2.3. In the DDT procedure, We first write out the binding network, how the totals are expressed in terms of all the species, and the steady state expressions for the target species (parts 1,2 and 3 in Figure2.4). This is for book-keeping reasons so we do not loose track when performing the decomposition steps. The expression for the totals can be obtained either by inspection or by calculating the conservation law matrix as discussed in the computational sampling procedure. The steady state expressions should include all the distinct ways that the target species can be expressed in the other species. These expressions are important in the DDT procedure to let us know when decomposed coordinates will give a constant reaction order. For example, from the steady state expression πΆπΊπ π = πΊπΆπ ππΎπΊπ πβ1 , we know if decomposed coordinates includeπΊandπΆπ π in them, then we the reaction order ofπΆπΊπ π is constant under this dominance condition.
With these preparations done, in part 4 of Figure2.4, we begin the decomposition steps for πΆπΊπ π in the induced activator binding network. In each step, we ask whether one of the
terms in a coordinate is dominant. For example, in the first step, we ask whetherπΆπΊπ π is dominant in total geneπ‘πΊ =πΆπΊπ π+πΊ. The branch thatπΆπΊπ π is dominant inπ‘πΊimmediately results in a constant reaction order, while the other branch thatπΊdominant inπΆπΊπ π need further decompositions. The decomposition steps continue untill all branches have reached a case where the reaction order is constant. Then taking the convex combination of all the resulting constant reaction orders gives us the reaction order polyhedra. Indeed, it can be checked with the visualization in Figure2.3that the constant vectors from DDT here are the vertices of the polytope.
We caution that the order in the sequence of decomposition steps taken could be very important. Wrong sequences could build DDT trees with spurious vertices that cannot be reached by reaction orders of the binding network. When performing the sequence of decomposition steps, given the coordinates at the current step, we should search for changes to the coordinates that require fewest number of steps to reach a constant reaction order. Like in Figure 2.4, we take out πΆπΊπ π from all the coordinates at the first three steps, because this is guaranteed to have a constant reaction order forπΆπΊπ π itself. We acknowledge that finding the right sequence of decomposition steps requires a bit of the art of problem solving, just like many powerful mathematical techniques. Nonetheless, we have guarantees that any polyhedron we obtain from DDT will always contain the reachable set of reaction orders in it. So we can only overshoot through DDT, and any polyhedron we obtain is an outer bound. This can be used to take a variational view to finding the right sequence of steps in DDT. No matter how we came up with a sequence of decomposition steps, the sequence that results in the smallest polyhedron is always closer to truth. Therefore, in the worse case, we can always search all possible sequences of steps or DDT trees by brute-force computer search. So DDT can be seen as a technique to derive the reaction order polyhedron directly for general binding networks. A generic computational complexity for deriving the reaction order polyhedra directly from DDT is therefore exponential in the number of species, π(ππ). This is perhaps not reducible, since just finding the finite vertices correspond to enumerating the vertices of a polytope given its facets (see Section3.7of Chapter3), which has complexity that is exponential inπ. However, DDT directly obtains the full reaction order polyhedra, so this compares favorably to numerical scans whenπis much smaller thanπ the number of points needed in the scan, either from reaction order formula, which has complexityπ(π π2.3), or from solving polynomial equations, which has complexityπ(π2π). In practice, it is often the case thatπ is several orders larger than2πso as to sufficiently cover the solution space.