S1 Additional free energy model details S1.1 Strand association penalty for a complex
Based on dimensional analysis, we define the complex concentrations π₯Ξ¨ for a test tube containing the set of complexes Ξ¨ as mole fractions rather than molarities (see (2.14)).
Therefore, we adjust the strand association penalty ΞπΊassoc = ΞπΊassoc
pub βπ πlog[πH
2O/(1 mol/liter)] (S1)
where ΞπΊassoc
pub is the published value for two strands associating [21] and πH
2O is the molarity of water (e.g.,πH
2O =55.14 mol/liter at 37β¦C).[7] The strand association penalty for a complex ofπΏstrands (see (2.1)) is then
(πΏβ1)ΞπΊassoc. (S2)
S1.2 Salt corrections for DNA complexes
The default salt conditions for RNA [10, 14, 20, 22, 24, 26] and DNA [2, 17, 18, 26]
parameter sets are [NaCl] = 1 M. Salt corrections are available for DNA parameters [9, 17β
19] to permit calculations in user-specified sodium, potassium, ammonium, and magnesium ion concentrations. Following SantaLucia and co-workers, the free energy of a DNA duplex at 37β¦C is augmented by
β0.114π
2 log[Na+], (S3)
for user-specified 0.05 M β€ [Na+] β€ 1.0 M, where π is the number of phosphates in the duplex and it is assumed that Ξπ» is independent of [Na+], which is valid for this salt regime [18, 19]. This salt correction was derived using duplexes with 16 bp or less and the accuracy decreases as duplex length increases further [18, 19]. The expression can be generalized to monovalent potassium and ammonium ions [19] as well as to divalent magnesium cations[9, 17]:
β0.114π 2 log
[Na+] + [K+] + [NH+4] +3.3[Mg++]1/2
, (S4)
for user-specified for 0.05 M β€ [Na+] + [K+] + [NH+4] β€ 1.0 M and 0.0 M β€ [Mg++] β€ 0.2 M.
To apply this salt correction to a complex ofπΏstrands at temperatureπ, consider a secondary structureπ containing one or more duplexes. We assume that strands are synthesized with one phosphate per base so that π/2 = πbp(π ) where π is the total number of phosphates in duplexes and πbp(π ) is the total number of base pairs in π . (If strands are synthesized without a 50 terminal phosphate, then π approximates the total number of phosphates in duplexes.) We further assume thatΞπ»is independent of cation concentration in this regime.
The secondary structure free energyΞπΊ(π, π )is then augmented by
πbp(π )ΞπΊsalt (S5)
with
ΞπΊsalt =β0.114 log
[Na+] + [K+] + [NH+4] +3.3[Mg++]1/2 π
π37
(S6) for user-specified
0.05 Mβ€ [Na+] + [K+] + [NH+4] β€ 1.0 M, (S7) 0.0 Mβ€ [Mg++] β€ 0.2 M, (S8) with π37 = 310.15 K. In order to incorporate this salt correction in dynamic programs without explicitly calculatingπbp(π ), note that for a complex ofπΏstrands, the total number of loops in each secondary structure is
πloop(π ) =πbp(π ) +1. (S9) This may be seen, for example, by starting with a single strand with no base pairs (corre- sponding to a single exterior loop). Each addition of a base pair adds one loop. Once all base pairs inπ have been added, each addition of a nick increases the number of strands by one without changing the number of loops (all secondary structures in the complex ensemble are connected so introduction of each nick converts a loop from another type to an exterior loop). Letπother
loop denote the total number of non-exterior loops andπexterior
loop denote the total number of exterior loops, so we have
πloop(π ) =πexterior
loop (π ) +πother
loop(π ). (S10)
For a complex ofπΏ strands,πexterior
loop (π ) = πΏ. Thus, the salt correction (S5) becomes πbp(π )ΞπΊsalt =(πΏβ1)ΞπΊsalt+πother
loop(π )ΞπΊsalt. (S11)
Hence, the salt correction can be implemented by adding
ΞπΊsalt (S12)
to every ΞπΊ(loop) except for exterior loops as a pre-processing step, using our suite of dynamic programs without modification, and then treating the constant term(πΏβ1)ΞπΊsalt in a post-processing step (see Section S1.4).
S1.3 Temperature dependence
The loop-based free energy model (2.1) is temperature dependent. Each loop free energy is calculated using
ΞπΊ(loop)= Ξπ»(loop) βπΞπ(loop) (S13) whereπis in Kelvin andΞπ»(loop)andΞπ(loop)are assumed to be temperature independent [19]. Model parameters are provided for RNA [10, 14, 20, 22, 24, 26] and DNA [2, 17, 18, 26] in the form ofΞπΊ37(loop)andΞπ»(loop)which can be used to calculate
Ξπ(loop)= 1 π37
[Ξπ»(loop) βΞπΊ37(loop)] (S14) withπ37 =310.15 K, so (S13) becomes
ΞπΊ(loop) = Ξπ»(loop) β π π37
[Ξπ»(loop) βΞπΊ37(loop)]. (S15) Similarly, for the strand association penalty (S1):
ΞπΊassoc
pub = Ξπ»assoc
pub βπΞπassoc
pub . (S16)
and the provided parametersΞπΊassoc
37,pubandΞπ»assoc
pub yield ΞπΊassoc
pub = Ξπ»assoc
pub β π π37
[Ξπ»assoc
pub βΞπΊassoc
37,pub]. (S17)
The temperature dependence is explicit in the form of the symmetry correction (2.5) and salt correction (S5).
S1.4 Treatment of constant free energy terms for complex ensembles
Consider a complex of πΏ strands containing a total of π nucleotides. Suppose that free energy model terms have been pre-processed as described above for units [see (S1)], salt
corrections [see (S12)], and temperature corrections [see (S15 and S16)] prior to calculating any physical quantities. The secondary structure free energy (2.1) then becomes
ΞπΊ(π, π )= (πΏβ1) [ΞπΊassoc+ΞπΊsalt] + βοΈ
loopβs
ΞπΊ(loop). (S18) After running the partition function dynamic program to calculate π1, π, the partition function is then
π(π)=exp{β(πΏβ1) [ΞπΊassoc+ΞπΊsalt]/π π}π1, π. (S19) where this post-processing step accounts for the constant terms ΞπΊassoc and ΞπΊsalt that affect all secondary structures in the complex ensemble. Likewise, after running the MFE dynamic program to calculateπΉ1, π, the free energy of the MFE stacking state is then
ΞπΊ(π, π q
MFE) =(πΏβ1) [ΞπΊsalt+ΞπΊassoc] +πΉ1, π. (S20) The equilibrium base-pairing probability ππ, π is calculated via (2.17) using the values of ππ
π, π(π), ππ
π , π+π(π0) and π1, π(π) returned by the dynamic program; the constant terms ΞπΊassoc andΞπΊsalt do not affect the calculation as they are omitted in both the numerator and the denominator of (2.17). The dynamic programs for calculating the MFE proxy structure, suboptimal structures, or sampled structures are unaffected by the constant terms ΞπΊassocandΞπΊsalt so no post-processing is required for those quantities.
S1.5 RNA and DNA parameter sets
NUPACK 4 algorithms perform calculations on the following complex ensembles:
β’ stacking: with coaxial and dangle stacking (ensembleΞq(π)).
β’ nostacking: without coaxial and dangle stacking (ensembleΞ(π)).
These ensembles can be used for calculations in combination with the following temperature- dependent DNA and RNA parameter sets:
β’ rna95 based on Serra and Turner [20] with additional parameters [26] including coaxial stacking [14, 22] and dangle stacking [20, 22, 26] in 1M Na+.
β’ dna04 based on SantaLucia [18] and SantaLucia and Hicks [19] with additional parameters [26] including coaxial stacking [17] and dangle stacking [2, 26] in user- specified concentrations of Na+, K+, NH+4, and Mg++(see Section S1.2 for details on implementation of the salt corrections) [9, 17β19].
β’ rna06 based on Mathews et al. [14], Mathews et al. [15], and Lu et al. [10] with additional parameters [24, 26] including coaxial stacking [14, 22] and dangle stacking [20, 22, 26] in 1M Na+.
β’ custom based on user-specified parameters representing nucleic acids or synthetic nucleic acid analogs in experimental conditions of choice.
Base pairs are either Watson-Crick pairs (GΒ·C and AΒ·U for RNA;GΒ·C and AΒ·T for DNA) or wobble pairs (GΒ·Ufor RNA). Note that for DNA,GandTform a mismatch and not a wobble pair [19].
S1.6 Historical RNA and DNA parameter sets (for backwards compatibility with NUPACK 3)
For backwards compatibility, the following historical complex ensembles without coaxial stacking and with approximate dangle stacking are supported (see Section S2.5):
β’ none-nupack3: no dangle stacking and no coaxial stacking (dangles βnoneβ option for NUPACK 3)
β’ some-nupack3: some dangle stacking and no coaxial stacking (dangles βsomeβ
option for NUPACK 3)
β’ all-nupack3: all dangle stacking and no coaxial stacking (dangles βallβ option for NUPACK 3)
For these historical ensembles, base pairs are either Watson-Crick pairs (GΒ·C and AΒ·U for RNA;GΒ·CandAΒ·Tfor DNA) or wobble pairs (GΒ·Ufor RNA;GΒ·Tfor DNA). Note that for the historical ensembles, GΒ·T is classified as a DNA wobble pair and not as a mismatch. The historical ensembles prohibit a wobble pair (GΒ·UorGΒ·T) as a terminal base pair in an exterior loop or a multiloop. As a result, an attempt to evaluate a free energy for a sequenceπand secondary structureπ that place a wobble pair as a terminal base pair in an exterior loop or multiloop will return ΞπΊ(π, π ) = ΞπΊ(π, π ) =β. These historical ensembles can be used for calculations in combination with the following historical DNA and RNA parameter sets:
β’ rna95-nupack3is the same asrna95except that terminal mismatch free energies in exterior loops and multiloops are replaced by two dangle stacking free energies (see equation (S55)).
β’ dna04-nupack3 is the same asdna04 except thatGΒ·T was treated as a wobble pair (analogous to aGΒ·URNA wobble pair) instead of classifying GandT as a mismatch.
Note that while terminal mismatch free energies in exterior loops and multiloops are replaced by two dangle stacking free energies (see equation (S55)), this is the same treatment as indna04, as terminal mismatch parameters are not public for DNA [19].
β’ rna99-nupack3 based on Mathews et al. [14] with additional parameters [24, 26]
including dangle stacking [20, 22, 26] in 1M Na+. Terminal mismatch free energies in exterior loops and multiloops are replaced by two dangle stacking free energies (see equation (S55)). Parameters are provided only for 37β¦C.
S1.7 Functional form of RNA and DNA free energy models S1.7.1 Free energy model for hairpin loops
A hairpin loop is defined for a subsequenceπ[π:π] by the single base pairπΒ· πsuch that there are no nicks or additional base pairs in the range[π: π]. Letπβ‘ πβπβ1 denote the number of unpaired nucleotides in the hairpin loop. Steric effects are assumed to prevent hairpin loops withπ <3 for both RNA [20, 22] and DNA [19]. The functional form of the hairpin free energy is as follows:
ΞπΊhairpin(π[π:π]) = ΞπΊhairpin
size (π) +ΞπΊhairpinseq (π[π:π]) (S21) For the size-dependent term [10, 14, 19, 20]:
ΞπΊhairpin
size (π) =




ο£²



ο£³
β, π <3
ΞπΊhairpinsize
π , 3β€ π β€ 30
ΞπΊhairpinsize
30 +log
π 30
ΞπΊpolymer
entropy, π >30
(S22)
β’ ΞπΊhairpinsize
π : a lookup table up to π = 30. rna95 and rna06 populate the lookup table using empirical values ofΞπΊhairpinsize
π up toπ=9 and logarithmic extrapolation for larger π [10, 14, 20]. dna04 populates the lookup table using empirical values ofΞπΊhairpinsize
π for a subset of 3 β€ π β€ 30 and logarithmic interpolation for the other values [19].
β’ ΞπΊpolymer
entropy: a logarithmic extrapolation parameter based on Jacobson-Stockmayer polymer theory for π > 30. rna95, dna04, and rna06 use previously published values [19, 20].
For the sequence-dependent term:[10, 14, 19]
ΞπΊhairpin
seq (π[π:π])=




ο£²



ο£³
ΞπΊtriloop
π[π:π] +ΞπΊterminalbp
ππ,ππ π=3
ΞπΊtetraloop
π[π:π] π=4
ΞπΊhairpinmm
ππβ1,ππ,ππ,ππ+1 π β₯5
(S23)
β’ ΞπΊtriloop
π[π:π] : sequence-dependent penalty for hairpin loop of lengthπ = 3. 0 kcal/mol forrna95[20]. Empirical values fordna04[19] andrna06[15].
β’ ΞπΊterminalbp
ππ,ππ : sequence-dependent penalty for non-GC terminal base pair at the end of a duplex. Empirical values forrna95[20],dna04[19], andrna06[15].
β’ ΞπΊtetraloop
π[π:π] : sequence-dependent penalty for hairpin loop of lengthπ =4. Empirical values forrna95[20],dna04[19], andrna06[15].
β’ ΞπΊhairpinmm
ππβ1,ππ,ππ,ππ+1: sequence-dependent term for mismatched bases adjacent to base pair πΒ· π. Empirical values set equal to ΞπΊterminalmm
ππβ1,ππ,ππ,ππ+1 plus sequence-dependent modi- fications for rna95[20] and rna06 [10, 14]. Empirical values for ΞπΊterminalmm
ππβ1,ππ,ππ,ππ+1
not public for DNA [19], so ΞπΊhairpinmm
ππβ1,ππ,ππ,ππ+1 set to unpublished values made avail- able in the Mfold software [26] fordna04. (See multiloops and exterior loops for a description ofΞπΊterminalmm
ππβ1,ππ,ππ,ππ+1).
S1.7.2 Free energy model for interior loops
An interior loop may be defined via a pair of subsequences π[π:π] and π[π:π] such that π < π < π < πwith base pairsπΒ· πandπΒ·π, with no additional paired bases or nicks within the two subsequences.
Stacked pairs. Stacked pairs are the special case whereπ =π+1 and π =π+1.
ΞπΊstackedpair(π[π:π+1], π[πβ1:π]) = ΞπΊstack
ππ,ππ,ππ+1,ππβ1 (S24)
β’ ΞπΊstack
ππ,ππ,ππ+1,ππβ1: the stack free energy has been determined for all allowable base pair combinations from experimental results forrna95[20],rna06[10, 14, 24], and dna04[19].
Bulge loops. A bulge loop is the special case with either π =π +1 or π = π+1 but not both. Here, we will outline the functional form whenπ =π+1. Let πβ‘ π βπβ1 denote the number of unpaired nucleotides in the bulge loop.
ΞπΊbulge(π[π:π+1], π[π:π]) = ΞπΊbulge
size (π) +ΞπΊbulge
seq (π[π:π+1], π[π:π]). (S25) For the size-dependent term:
ΞπΊbulge
size (π) =


ο£²


ο£³
ΞπΊbulgesizeπ , π β€ 30
ΞπΊbulgesize
30 +log
π 30
ΞπΊpolymer
entropy, π >30
(S26)
β’ ΞπΊbulge size
π : rna95uses empirical values for 1 β€ π β€ 5 [20]. rna06uses empirical values for 1β€ π β€ 6 [10, 14]. dna04uses empirical values for a subset of 1β€ π β€ 30 [19]. Each parameter set uses a logarithmic approximation for all other values ofπ. For the sequence-dependent term:
ΞπΊbulge
seq (π[π:π+1], π[π:π]) =


ο£²


ο£³
ΞπΊstack
ππ,ππ,ππ+1,ππ
, π+2= π
ΞπΊterminalbp
ππ,ππ +ΞπΊterminalbp
ππ+1,ππ , otherwise.
(S27)
Other small interior loops. The free energies for interior loops with 2β€ π βπ β€ 3 and 2β€ πβπ β€ 3 are kept in a lookup table.
β’ 1Γ1 interior loop. Corresponds toπβπ =2 andπβπ =2. rna95assigns a sequence- independent ΞπΊ [20]. rna06 uses unpublished parameters made available in the Mfold software [26]. dna04models these loops using (S28) below [19]; a positive constant free energy is assigned for mismatches where the unpaired nucleotides are Watson-Crick complements [26].
β’ 1Γ2 interior loop. Corresponds to π β π = 2 and π βπ = 3, or π βπ = 3 and π βπ = 2. rna95 and dna04 model these loops using (S28) below [19, 20]. For dna04, a positive constant free energy is assigned for mismatches where the unpaired nucleotides are Watson-Crick complements [26]. rna06models these loops using a combination of tabulated data and averaging [10, 14].
β’ 2Γ2 interior loop. Corresponds to π βπ = 3 and π βπ = 3. rna95 and dna04 model these loops using (S28) below [19, 20]. Fordna04, a positive constant free energy is assigned for mismatches where the unpaired nucleotides are Watson-Crick complements [26]. rna06 models these loops using tabulated symmetric tandem interior mismatches and averaging for asymmetric tandem interior mismatches [10, 14].
Other interior loops. Letπ1 β‘πβπβ1 andπ2β‘ πβπβ1 denote the number of unpaired nucleotides for the two sides of the interior loop. For the general case of interior loops not handled via special cases above, the following formula is used:
ΞπΊinterior(π[π:π], π[π:π]) = ΞπΊinterior
size (π1+π2) +ΞπΊinterior
asymm(π1, π2) +ΞπΊinterior
mm (π[π:π], π[π:π]). (S28) For the size-dependent term:
ΞπΊinterior
size (π) =


ο£²


ο£³
ΞπΊinteriorsize
π , π β€ 30
ΞπΊinteriorsize
30 +log
π 30
ΞπΊpolymer
entropy, π >30
(S29)
β’ ΞπΊinteriorsize
π : rna95uses empirical values for 2β€ π β€ 6 [20]. rna06uses empirical values for 4 β€ π β€ 6 [10, 14]. dna04uses empirical values for a subset of values in 3 β€ π β€ 30 [19]. Each parameter set uses a logarithmic approximation for all other values ofπ.
For the asymmetry-based term:
ΞπΊinterior
asymm(π1, π2)=min(ΞπΊinteriorasymm
4 ,|π1βπ2|ΞπΊinteriorasymm
min(4,π2,π1) ) (S30)
β’ ΞπΊinteriorasymm
π : rna95 [20], rna06 [10, 14], and dna04 [19] use values regressed from empirical data.
For the mismatch-based term:
ΞπΊinterior
mm (π[π:π], π[π:π]) =


ο£²


ο£³
ΞπΊinteriormm
0
ππβ1,ππ,ππ,ππ+1 +ΞπΊinteriormm
0
ππβ1,ππ,ππ,ππ+1 π+2=πorπ+2= π ΞπΊinteriormm
ππβ1,ππ,ππ,ππ+1 +ΞπΊinteriormm
ππβ1,ππ,ππ,ππ+1 otherwise
(S31)
β’ ΞπΊinteriormm
ππβ1,ππ,ππ,ππ+1: rna95[20] andrna06[10, 14] use independently determined values for loops without complementary unpaired bases. dna04equatesΞπΊinteriormm
ππβ1,ππ,ππ,ππ+1with ΞπΊterminalmm
ππβ1,ππ,ππ,ππ+1 [19], which are assigned unpublished values made available in the Mfold software [26].
β’ ΞπΊinteriormm
0
ππβ1,ππ,ππ,ππ+1: rna95 [20], rna06 [10, 14],dna04 [19] use different parameters for the case when one side of the interior loop has only one unpaired nucleotide [15].
S1.7.3 Free energy model for multiloops
A multiloop contains 3 or more terminal base pairs and no nicks. It may be defined as a series of bounding subsequences [π]. If the number of terminal base pairs isπbp and the number of unpaired nucleotides isπnt, the free energy for a multiloop in a specified stacking state,π, is modeled as follows:
ΞπΊmulti( [π], π) = ΞπΊmulti
init +πbpΞπΊmulti
bp +πntΞπΊmulti
nt
+ΞπΊallterminalbp( [π]) +ΞπΊallcoax( [π], π) +ΞπΊalldangle( [π], π) (S32) where ΞπΊmulti
init denotes the penalty for formation of a multiloop, ΞπΊmulti
bp denotes the sequence-independent penalty for a terminal base pair in a multiloop, and ΞπΊmulti
nt de- notes the penalty per unpaired nucleotide in a multiloop. Note that in contrast to interior loops and hairpin loops, the free energy of a multiloop is assumed to scale linearly, not log- arithmically, with the number of unpaired nucleotides; the linear simplification facilitates the derivation ofπ(π3)multiloop recursions.
β’ ΞπΊmulti
init : empirical values for rna95 [20]; newly regressed values (Table A.1) for rna06[13]; unpublished values fordna04[26].
β’ ΞπΊmulti
bp : empirical values for rna95 [20]; newly regressed values (Table A.1) for rna06[13]; unpublished values fordna04[26].
β’ ΞπΊmulti
nt : empirical values for rna95 [20]; newly regressed values (Table A.1) for rna06[13]; unpublished values fordna04[26].
Note that forrna06, previously published parameter regressions [10, 14] use a functional form incompatible with the definition of ΞπΊmulti( [π], π) above [20, 26]. Using literature source data for multiloops [13], we regressed the values of theΞπΊmulti
init ,ΞπΊmulti
bp ,ΞπΊmulti
nt via a least-squares fit of the regressed loop free energies, observing comparable mean absolute error (Table A.1).
ΞπΊallterminalbp( [π]) is a sum of the sequence-dependent free energyΞπΊterminalbp
ππ,ππ for each terminal base pairπΒ· π in the multiloop (see definition above under hairpin loops).
ΞπΊallcoax( [π], π) is a sum over each coaxial stack present in the multiloop stacking state π. Owing to a lack of parameters, only coaxial stacks between adjacent terminal base
pairs (with no intervening unpaired bases) are considered. Each coaxial stack between base pairsπΒ·πandπ+1Β· π contributes a free energy ofΞπΊcoax
ππ,ππ,ππ+1,ππ. For the recursions with coaxial stacking (Section S2.6), we useΞπΊcoax
π,π , π(π)to denoteΞπΊcoax
ππ,ππ,ππ+1,ππ since only three indices may vary freely. For the recursions without coaxial stacking (Section S2.3), the termΞπΊallcoax( [π], π) is neglected.
β’ ΞπΊcoax
ππ,ππ,ππ+1,ππ: rna95[20] andrna06[10, 14] setΞπΊcoax
ππ,ππ,ππ+1,ππequal toΞπΊstack
ππ,ππ,ππ+1,ππ. dna04uses independently estimated values [17].
ΞπΊalldangle( [π], π) is a sum of the sequence-dependent free energy, ΞπΊdangle
π, π (π), for each terminal base pairπΒ· π that is not in a coaxial stack in stacking stateπ. For a given terminal base pairπΒ· π,ΞπΊdangle
π, π (π) takes one of four values to match the dangle stacking state for a givenπ:
ΞπΊdangle
π, π (π) =





ο£²





ο£³
0 no dangles
ΞπΊ5
0dangle
ππ,ππ+1,ππ 50dangle ΞπΊ3
0dangle
ππ,ππβ1,ππ 30dangle ΞπΊterminalmm
ππ,ππ+1,ππβ1,ππ terminal mismatch
(S33)
Note that the state where both 50and 30dangles stack on terminal base pairπΒ· π is classified as a terminal mismatch. For the recursions without dangle stacking (Section S2.3), the term ΞπΊalldangle( [π], π)is neglected.
β’ ΞπΊ5
0dangle
ππ,ππ,ππ: 50dangle free energy parametrized forrna95[20],rna06[10, 14], and dna04[19].
β’ ΞπΊ3
0dangle
ππ,ππ,ππ: 30dangle free energy parametrized forrna95[20],rna06[10, 14], and dna04[19].
Quantity Ξ[πΊ/π»]init
multi Ξ[πΊ/π»]bp
multi Ξ[πΊ/π»]nt
multi MAE MAE[13]
ΞπΊ +12.91 -1.28 -0.0880 1.01 1.01
Ξπ» +81.06 -6.84 +2.22 11.5 12.1
Table A.1: Regression of multiloop parameters for rna06 (kcal/mol). MAE denotes the mean absolute error of the least-squares regression of the loop free energies from Reference [13] using formulation (S32) forΞπΊmulti( [π], π). MAE [13] refers to the mean absolute error of the regression performed in Reference [13] using a different formulation ofΞπΊmulti( [π], π).
β’ ΞπΊterminalmm
ππ,ππ+1,ππβ1,ππ: rna95 [20] and rna06 [10, 14] use empirical parameters for ΞπΊterminalmm;dna04assignsΞπΊterminalmm
ππ,ππ+1,ππβ1,ππto be the sum ofΞπΊ5
0dangle
ππ,ππ+1,ππandΞπΊ3
0dangle ππ,ππβ1,ππ
as empirical values ofΞπΊterminalmm
ππ,ππ+1,ππβ1,ππ are not publicly available [19].
S1.7.4 Free energy model for exterior loops
An exterior loop is a loop containing one nick and zero or more terminal base pairs. An exterior loop may be defined as a series of bounding subsequences [π] with a given nick location. An unpaired strand is an exterior loop with a free energy of zero, corresponding to the reference state [7]. The free energy of an exterior loop in a specified stacking stateπ is modeled as follows:
ΞπΊexterior( [π], π) =0+ΞπΊallterminalbp( [π])+ΞπΊallcoax( [π], π)+ΞπΊalldangle( [π], π). (S34) The functionsΞπΊallterminalbp( [π]),ΞπΊallcoax( [π], π), andΞπΊalldangle( [π], π)are defined as above for multiloops. For the recursions without coaxial and dangle stacking (Section S2.3), the termsΞπΊallcoax( [π], π)andΞπΊalldangle( [π], π) are neglected.
S2 Recursions for the complex ensemble with or without coaxial and dangle stacking Recursions specify the dependencies between subproblems and incorporate the details of the complex structural ensemble and the free energy model. The recursions described here can be combined with a quantity-specific evaluation algebra (Section S3) and a quantity- specific operation order (Section S4) to calculate diverse physical quantities. Each recursion corresponds to an efficient iteration through aconditional ensembleof substructures within a given subsequence that are compatible with a specified set of constraints. For a given recursion, a conditional ensemble might include an explicit structural element, which can be considered the base case of the recursion, or a reference to the result of another recursion.
S2.1 Separate recursions for intrastrand and interstrand blocks
Reference [7] described dynamic programming recursions for the complex ensemble that checked for a nick next to each nucleotide. This approach enabled treatment of com- plexes containing an arbitrary number of strands, but caused unnecessary complications in the program flow and eliminated any possibility of vectorization due to the conditional checks within each βforβ loop. By contrast, here we employ separate sets of recursions for triangular intrastrand blocks and rectangular interstrand blocks (Figure S1). As a result, each intrastrand and interstrand recursion is kept as simple as possible and both types of recursions can be efficiently vectorized.
Complex ABC blockC blockB BC
block blockABC blockAB
blockA
a b Vector operation for element
in intrastrand B block c Vector operation for element in interstrand BC block
A B C
A
B
C row
col element j
i
i, j
row i
colj
element i, j
Figure S1: Separate recursions for intrastrand and interstrand blocks. (a) Triangular in- trastrand blocks (A, B, C) and rectangular interstrand blocks (AB, BC, ABC) for complex ABC. Element π, π corresponds to a conditional ensemble for subsequence [π, π] which contains no nicks if π, π is in an intrastrand block and one or more nicks if π, π is in an interstrand block. (b) Each recursion operation for calculation of element π, π in an intrastrand block (e.g., ππ, π β Γ
πβ€π < πππ,πππ+1, π) can be implemented as a vectorized dot product between a subvector of row π (brown) and a subvector of column π (gray) to obtain elementπ, π (purple). Note that calculation of an elementπ, π in an intrastrand block uses elements in the same intrastrand block (calculated using intrastrand recursions).
(c) Each recursion operation for calculation of element π, π in an interstrand block (e.g., ππ, π β Γ
πβ€π < π ,strand(π)=strand(π+1)ππ,πππ+1, π) can be implemented as multiple vectorized dot products between valid subvectors of row π (brown) and valid subvectors of column π (gray) to obtain element π, π (purple), where valid positions are those that avoid intro- ducing disconnected structures into the complex ensemble (see Algorithm S2). Note that calculation of elementπ, π in an interstrand block uses elements in one or more interstrand blocks (calculated with interstrand recursions) and two intrastrand blocks (calculated with intrastrand recursions).
S2.2 Conventions for recursion diagrams and equations
In the following sections we will describe recursions corresponding to the complex ensemble without stacking terms (Section S2.3) and with coaxial and dangle stacking subensembles (Section S2.6). Each recursion iterates over all conditional ensembles compatible with the constraints defined for a given recursion type. For a complex of π nucleotides, each full set of recursions isπ(π3) in time andπ(π2) in space. For interior loop recursions, we