Runaway Transition in Irreversible Polymer Condensation with cyclisation (2024)

^†^†thanks: joint first author^†^†thanks: joint first author^†^†thanks: corresponding author, davide.michieletto@ed.ac.uk

Maria PanoukidouSchool of Physics and Astronomy, University of Edinburgh, Peter Guthrie Tait Road, Edinburgh, EH9 3FD, UK Simon WeirSchool of Physics and Astronomy, University of Edinburgh, Peter Guthrie Tait Road, Edinburgh, EH9 3FD, UK Valerio SorichettiInstitute of Science and Technology Austria, 3400 Klosterneuburg, AustriaLaboratoire de Physique Théorique et Modèles Statistiques (LPTMS), CNRS, Université Paris-Saclay, F-91405 Orsay, France Yair Gutierrez FosadoSchool of Physics and Astronomy, University of Edinburgh, Peter Guthrie Tait Road, Edinburgh, EH9 3FD, UK Martin LenzLaboratoire de Physique Théorique et Modèles Statistiques (LPTMS), CNRS, Université Paris-Saclay, F-91405 Orsay, FrancePMMH, CNRS, ESPCI Paris, PSL University, Sorbonne Université,Université de Paris, F-75005, Paris, France Davide MichielettoSchool of Physics and Astronomy, University of Edinburgh, Peter Guthrie Tait Road, Edinburgh, EH9 3FD, UKMRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK

Abstract

The process of polymer condensation, i.e. the formation of bonds between reactive end-groups, is ubiquitous in both industry and biology. Here we study generic systems undergoing polymer condensation in competition with cyclisation. Using a generalised Smoluchowski theory, molecular dynamics simulations and experiments with DNA and ATP-consuming T4 ligase, we find that this system displays a transition, from a ring-dominated regime with finite-length chains at an infinite time to a linear-polymers-dominated one with chains that keep growing in time. Finally, we show that fluids prepared close to the transition may have widely different compositions and rheology at large condensation times.

I Introduction

Linear polymer condensation is the process by which two polymeric end groups react to form a bond. Beyond its relevance to industry[1], and biotechnology[2], it underpins the biophysics of DNA repair and cloning[3]. In the absence of loop formation, polymer condensation will yield linear chains with average length $\langle l\rangle=1/(1-p)$ where $p$ is the extent of the condensation reaction[1, 4]. However, looping, or cyclisation, is expected to be favourable in certain conditions[5, 6, 7]. Several theories on reversible polymer condensation and experiments have, over the last decades, attempted to reach a consensus on whether the polymers in such systems will all eventually convert into rings or whether there always be a linear population at a large-time scale [8, 9, 10, 11, 12, 13, 14, 15]. Despite this, the polymer physics and chemistry communities have not yet reached a consensus [15, 16, 17]. Additionally, there is little literature on irreversible polymer condensation, which we also refer to as “ligation” henceforth in analogy with the biological process of connecting DNA segments by the enzyme ligase.

Here we study irreversible linear polymer condensation using a combination of theory, simulations, and experiments. First, we show that irreversible polymer condensation is well captured by a modified Smoluchowski coagulation equation[18, 19] with an additional sink term that captures ring formation. By spanning a range of monomer concentrations $c$ , we discover that above a critical $c^{\dagger}\simeq 0.1c^{*}$ there is a “runaway” transition characterised by a population of chains that permanently escape cyclisation. Here $c^{*}=l_{0}/(4/3\pi R_{g}^{3}$ ) denotes the overlap concentration of polymers with $l_{0}$ and $R_{g}$ the initial polymer length and radius of gyration respectively. This transition separates a regime $(c<c^{\dagger})$ in which all the chains are converted into rings at infinite time, from one ( $c>c^{\dagger}$ ) in which the length of the linear chains diverges in time. The consequence of this runaway transition is that systems prepared close to $c^{\dagger}$ and driven out-of-equilibrium by irreversible condensation will display markedly different architectural and rheological features at large enough times.

Our work differs from classic and also more recent papers on polymer condensation and cyclisation[8, 11, 20, 11, 16] because it deals with irreversible condensation while implementing subdiffusive search and cyclisation in a Smoluchowski framework and because it suggests through theory, simulations and experiments, that a runaway transition is expected beyond a critical concentration.We also argue that DNA is particularly suitable to test these theories as we can readily visualise the products of ligation reactions by gel electrophoresis and distinguish linear and circular forms by treating the samples with exonuclease, as described below.We conclude our paper by discussing the implications of our findings in the design of soft materials and DNA cloning.

II Methods

II.1 Molecular Dynamics Simulations

Runaway Transition in Irreversible Polymer Condensation with cyclisation (1)

We model a 6,500 bp-long linear DNA molecule as a bead-spring polymer made of $l_{0}=174$ beads. The total number of polymer chains is $N_{c}=200$ . The polymers are modelled via the Kremer-Grest model[21]. Each bead has a diameter $\sigma=13$ nm (or $\sim 38$ bp), modelled as a truncated and shifted Lennard-Jones potential (WCA)

U_{\text{LJ}}(r)=4\epsilon\left[(\sigma/r)^{12}-(\sigma/r)^{6}+1/4\right],

(1)

for $r<r_{c}=2^{1/6}\sigma$ and 0 otherwise. Here $r$ represents the distance between beads and $\epsilon=1.0$ (in LJ units) parametrises the strength of the potential. The diameter of the bead, $\sigma$ , defines the length units in our system. Consecutive beads are connected through a permanent Finite Extensible Non-linear Elastic (FENE) bond

U_{\text{FENE}}(r)=-0.5KR_{0}^{2}\log{\left[1-\left(r/R_{0}\right)^{2}\right]}

(2)

with $K=30\epsilon/\sigma^{2}$ and $R_{0}=1.5\sigma$ , which is summed to a WLC potential to yield an equilibrium bond length around $0.9\sigma$ . The bending stiffness of the polymer is controlled by a Kratky–Porod interaction

U_{b}(r)=\dfrac{k_{B}Tl_{p}}{\sigma}(1-\cos{\theta}),

(3)

which constrains the angle ( $\theta$ ) defined by the two tangent vectors connecting three consecutive beads along the polymer. Here, $l_{p}=4\sigma=150$ bp is the persistence length of DNA. We note that as $l_{0}\gg l_{p}$ , we are always in the flexible chain regime. The solvent is simulated implicitly using a Langevin thermostat so that the time evolution of our system is governed by the stochastic partial differential equations

m\ddot{\bm{r}}=-\zeta\dot{\bm{r}}-\bm{\nabla}U+\sqrt{2k_{B}T\zeta}\bm{\delta}

(4)

where $\bm{r}$ is the position of a particle, $\zeta$ its friction, $m$ its mass, $U$ the sum of the interaction potentials discussed above and $\bm{\delta}$ white noise with unit variance. The diffusion timescale is $\tau_{B}=\zeta\sigma^{2}/k_{B}T$ . The integration of the Langevin equation is done with a velocity-Verlet algorithm, using a time step $\Delta t=0.01\tau_{B}$ in LAMMPS[22].

Various monomer densities were considered, ranging from $10^{-2}c^{*}$ to $1c^{*}$ , where $c^{*}=0.012\sigma^{-3}$ is the monomer concentration at which the polymers start to overlap. The overlap concentration $c^{*}$ was measured by computing the radius of gyration $R_{g}$ of the polymers in equilibrium at infinite dilution. All the systems were equilibrated for a sufficient amount of time to ensure that the polymer chains have moved at least a distance equal to $R_{g}$ .

After the equilibration step, 40 replicas of production runs were started for each number density considered. The ligation is performed stochastically and is attempted every $t_{l}=\tau_{B}$ between two end beads that are closer than $R_{c}=1.1\sigma$ using the fix bond/create LAMMPS command. The choice of the time in between ligation attempts, $t_{l}$ , was made so that it was much shorter than the relaxation time of the chains; in this way, the condensation process is diffusion-limited. The distance threshold $R_{c}$ was chosen so that the new bond created is a FENE with cutoff 1.5 $\sigma$ and to avoid unstable simulations. The probability of successful ligation (i.e., bond formation) is set to $p_{l}=0.1$ . This value was chosen to avoid “granularity” in the stochastic condensation reaction. If this parameter was set to 1, all the ends that can react would do so in a single time step introducing granular events in our simulations. Setting $p_{l}<1$ introduces some randomness that simply maps to a smaller average condensation rate. We have tested slightly different choices of these parameters and we found that the main results and qualitative behaviour of our results are not affected. In particular, we have tested that the reactions remain diffusion-limited even with our choice of $p_{l}$ . A schematic representation of the simulation process is shown in Fig. 1.

Once ligated, the bond formed between the polymers is irreversible and cannot be broken, therefore accounting for the formation of a covalent bond between the DNA fragments. During the ligation process snapshots of the system are taken every $10^{6}$ time steps on both the 3D coordinates of the beads and the bond list at those time steps. From the bond list we can, later on, reconstruct the topology of the individual polymers, i.e. if fused with others to form linear chains or if circularised.

For the topology reconstruction, the trajectories and bond lists were analysed using our Python code (https://git.ecdf.ed.ac.uk/taplab/dna-ligation.git). The description of the algorithm can be found in Appendix A.

II.2 The DSMC algorithm

The modified Smoluchowski equation proposed here, (see below Eq.(5)), can only be solved analytically for certain forms of the condensation rate $k_{1}(i,j)$ and of the cyclisation rate $k_{0}(l)$ . As our Molecular Dynamics simulations are practically limited to systems of hundreds of chains, to characterize the behaviour of larger systems we solve the Smoluchowski equation numerically employing the Direct Simulation Monte Carlo (DSMC) algorithm[23, 24, 25, 26]. DSMC is a powerful stochastic method to solve differential equations such as Eq.(5), and which samples the correct ligation kinetics in the limit of large system sizes. The algorithm employed here is similar to the one described in Ref.[26], with the difference that here we do not include fragmentation, but instead, we include ring formation. The description of the algorithm also follows Ref.[26]. The starting point for the Monte Carlo algorithm is an array $\mathbf{m}$ of length $N_{c}$ , each element $i$ of which contains a number $m_{i}$ which represents the mass/length of the chain $i$ :

\mathbf{m}=(m_{1},m_{2},\dots,m_{N_{c}})\,.

A value of $0$ corresponds to the absence of a certain chain. Moreover, to satisfy mass conservation we ensure that $\sum_{i=1}^{N_{c}}m_{i}=N_{c}$ is true at any time during the simulation. Here, $N_{c}$ denotes the total number of polymer chains. We will also consider an analogous array $\mathbf{r}$ of length $N_{c}$ (initially empty), where we save the masses of the rings.

For an initial monodisperse condition, we set $\mathbf{m}_{0}=(1,1,\dots,1)$ . After the array m is initialized, we run the DSMC simulation, which consists of repeating a large number of times a Monte Carlo step (described in detail in Appendix B). The execution is terminated when the system has reached a state in which there is a single linear chain and several non-reactive rings, where the only possible reaction is the cyclisation of the remaining linear chain.

II.3 Experiments

II.3.1 Ligation Reactions with DNA

We perform irreversible condensation on linear DNA using T4 ligase New England Biolabs (NEB). This enzyme consumes ATP to form a covalent bond between two proximal and complementary double-stranded DNA ends. More specifically, we perform irreversible condensation on a monodisperse solution of linear, $l_{0}=6,500$ bp-long plasmid (referred to as “1288” plasmid here) which is converted into a linear form by using a restriction enzyme (XhoI). This linearisation step is checked on gel electrophoresis. The equilibrium radius of gyration of this linear DNA molecule is about $R_{g}\simeq l_{p}\sqrt{l_{0}/3l_{p}}\simeq 0.2$ $\mu$ m (in agreement with diffusion data from Ref.[27]). This yields an overlap concentration $c^{*}=3l_{0}M_{w}/(4N_{A}\pi R_{g}^{3})\simeq 0.2$ $\mu$ g/ $\mu$ l with $M_{w}=650$ g/mol the molecular weight of a DNA basepair and $N_{A}$ the Avogadro number. For the low DNA concentration experiments we set the sample at $0.01c^{*}$ , i.e. $c=2$ $\mu$ g/ml. To perform ligation we use T4 ligase (NEB, M0202L, 1U corresponds to 0.5 ng or 0.00735 pmoles of protein according to Ref.[28]), and work at 1x T4 ligase reaction buffer concentration, which contains 1 mM ATP. To classify the topology of the DNA under ligation, we perform time-resolved gel electrophoresis. We prepare a master solution of DNA at the desired concentration, 1x ligase buffer and 2 U/ $\mu$ l T4 ligase.

After adding T4 ligase, we draw aliquots at time intervals and heat-inactivate the reaction by heating the aliquot at 65^∘C for 15 minutes. We then split the aliquot and treat one of the two sub-aliquots using exonuclease (RecBCD, Lucigen), an enzyme that digests linear, but not circular, DNA. Finally, we treat all aliquots with Nb.BbvCI Nickase (NEB, R0631L) to relax the supercoiled population [29]. The resulting aliquots are run on a gel: we load 20ng of DNA from each aliquot onto a 1% agarose gel prepared using 1x TAE buffer. A standard $\lambda$ DNA - HindIII digest (NEB, N3012S) marker is also loaded. The gel is run at $\sim$ 2.5V/cm for 5 hours and post-stained with SybrGold (ThermoFisher) for 30 minutes. A Syngene G-box and Genesys software is used to image the gels.

The combination of nickase (relaxing the DNA supercoiling) and exonuclease (fully digesting linear DNA molecules) allowed the topology of the DNA in each band to be unambiguously identified. Further, the $\lambda$ DNA - HindIII digest marker confirmed the bands were of the correct size for monomer and dimer lengths. Here the terms “monomer” and “dimer” refer to a single DNA molecule and two molecules ligated, respectively. To extract the relative amount of molecules in each lane we compute, using ImageJ, the intensity of each lane and account for the fact that the band with dimers has chains that are twice as long. We then normalise against the sum of the three bands to obtain the relative fraction of chains in each population.

II.3.2 Microrheology

The viscosity of the systems is measured using particle tracking microrheology. Solutions are made by mixing 8 $\mu$ l of 1288 linearised plasmid at different concentrations to a final concentration in the range 2ng/ $\mu$ l-500ng/ $\mu$ l with 1 $\mu$ l of 40 U/ul T4 ligase and 1 $\mu$ l of T4 ligase reaction buffer. Control solutions are prepared at the same time and in the same manner substituting additional TE for the T4 ligase. The samples are kept at room temperature on a roller for several days. The samples are then spiked with $a=800$ nm PVP-coated polystyrene beads, pipetted and sealed onto a slide and imaged using an inverted microscope. We take a 30-minute movie and we analyse the movies using a particle tracking algorithm (trackpy[30]) and extract the trajectories and mean squared displacements(MSD) of the tracers $\langle\Delta r^{2}(t)\rangle=\langle\left[\bm{r}(t+\tau)-\bm{r}(t)\right]^{2}\rangle$ . Diffusion coefficients are extracted by fitting to the MSDs via MSD $=2Dt$ . The viscosity is obtained using the Stokes-Einstein relation[31], $\eta=k_{B}T/(3\pi Da)$ .

III Results

Runaway Transition in Irreversible Polymer Condensation with cyclisation (2)

III.1 Smoluchowski equation with cyclisation

In this result section, we first propose a modified Smoluchowski equation[18, 19] describing polymers undergoing irreversible condensation (ligation) and cyclisation. Linear polymers undergo irreversible ligation with rate $k_{1}(i,j)$ , with $i,j$ the polymerisation indexes of the reactants, and cyclisation with rate $k_{0}(q)$ . The concentrations of linear polymers of polymerisation index $q$ at time $t$ , $n_{q}(t)$ , and of rings, $n_{q}^{r}(t)$ , are thus governed by the following equations:


$\displaystyle\dot{n}_{q}(t)$	$\displaystyle=\frac{1}{2}\sum_{ij;i+j=q}k_{1}(i,j)n_{i}(t)n_{j}(t)+$
	$\displaystyle-n_{q}(t)\sum_{i=1}^{\infty}k_{1}(q,i)n_{i}(t)-k_{0}(q)n_{q}(t)$	(5a)
$\displaystyle\dot{n}_{q}^{r}(t)$	$\displaystyle=k_{0}(q)n_{q}(t)\,.$	(5b)

Once a linear chain undergoes cyclisation, it becomes a ring and cannot undergo ligation anymore, as the reactions are assumed to be irreversible. The kinetics is also constrained by the requirement that the total mass is conserved:

\sum_{q=1}^{\infty}q[n_{q}(t)+n_{q}^{r}(t)]=M/V=n\quad\forall t\,,

(6)

where $M$ is the total number of monomers and $V$ is the system’s volume. Assuming that the reaction takes place on a time scale larger than the Rouse relaxation time, the length-dependence of the annealing rate is[32, 33]

	$\displaystyle k_{1}(i,j)$	$\displaystyle=\tilde{\kappa}_{1}(D_{i}+D_{j})(R_{i}+R_{j})$		(7)
		$\displaystyle=\kappa_{1}\left(i^{-\alpha}+j^{-\alpha}\right)\left(i^{\nu}+j^{%\nu}\right)\,,$		(8)

where $l=il_{0}$ is the length of a polymer with a degree of polymerisation $i$ and $l_{0}$ is the initial polymer length, so that the chain’s radius of gyration is $R_{i}=l_{0}i^{\nu}$ . In Eq.(8), $\tilde{\kappa}_{1}$ is a dimensionless constant and $\kappa_{1}$ is a constant that depends on temperature and the viscous friction of the solvent $\zeta$ . For example, in the Rouse model[34] $D_{i}=k_{B}T/(\zeta l_{0}i)$ ( $\alpha=1$ ) and thus $\kappa_{1}=\tilde{\kappa}_{1}k_{B}T/\zeta$ .

This condensation rate captures the diffusion-controlled search process[32, 33]. The cyclisation rate is taken to be $k_{0}(q)=\kappa_{0}q^{\mu}$ , where $\mu=-4\nu$ . Note that this is different from the classic Shimada-Yamakawa theory[20, 35] which would predict $\mu=-3\nu$ at lengths larger than $l_{p}$ because we (i) are out-of-equilibrium and (ii) account for the subdiffusion of the polymer end within the volume of the coil.

In equilibrium, the looping probability of a chain is given by the Shimada-Yamakawa formula[20, 36]. For $l\gg l_{p}$ the looping probability of a polymer decays as $P(l)\sim l^{\mu}$ with $\mu=-3\nu$ . This looping probability also holds for an irreversible, non-equilibrium scenario if the process is reaction-limited. This is because the chain ends would have the time to explore many conformations and to diffuse the whole volume of the chain, $V\sim l^{3\nu}$ , before reacting (as it would happen in equilibrium).In a diffusion-limited, irreversible ligation process, one should instead compute the time it takes for an end to diffuse over a certain distance $\xi$ . The dynamics of the end is described by the Rouse model[34] so that $\xi=b[k_{B}Tt/(\zeta b^{2})]^{1/4}$ , where $b$ is the size of a Kuhn monomer. Then, setting $\xi=R$ (the size of the polymer coil) one obtains $(R/b)^{4}=k_{B}Tt/(\zeta b^{2})$ , which implies $k_{0}\sim t^{-1}\sim R^{-4}\sim l^{-4\nu}$ . So considering $\mu=-4\nu$ effectively takes into account the fact that the chain ends are performing a sub-diffusive search process within the polymer coil, as expected for Rouse dynamics.

We have verified that the rate of cyclisation scales as the length of the chain to the power $-4\nu$ by measuring the rate at which rings are produced for different lengths of the linear chains (Fig.2a-b). We have done this by changing the initial length $l_{0}$ and by running short simulations, in turn assuming that the system has had no time to create dimers, trimers, etc. and by measuring the number of rings formed. We have observed that the rate of ring formation at early times $\dot{N}_{\text{rings}}\sim l_{0}^{-2.6}$ which is close to the expected $\mu=-4\nu=2.4$ with $\nu=0.588$ . Thus, both theory and simulations suggest that the diffusion limited, irreversible looping probability of a polymer scales with its length as $l^{-4\nu}$ .

To validate the functional form used for the condensation rate $k_{1}(i,j)$ (Eq.(8)), we solve the Smoluchowski equation in the limit of small concentration and short times, where only monomer, dimer and monomer ring populations are assumed to be present (see next Section). In Fig.2c we plot the condensation rate $k_{1}$ as a function of different initial polymer lengths obtained by fitting the analytical solution of Eq.(10b) (see below) to the monomer chains population omitting the second term since no rings were present in these conditions and at early times. From this quantity, we fit a power law $l^{\nu-\alpha}$ with $\nu=0.588$ and find $\alpha\simeq 1$ yields a good fit to the simulated data. This validates de Gennes’ hypothesis for the functional form of the condensation rate (Eq.(8)) and our choices for $\nu$ and $\alpha$ .

III.1.1 Time-dependence of the mean length: dilute regime

At short times and in the dilute regime, we can assume that the formation of rings and short $n$ -mers is more favourable. This assumption is valid in the experiments whenever only linear monomers, dimers and monomer rings are visible in the gel electrophoresis after ligation. In more dense solutions the presence of rings consisting of more than two monomer chains will be present and is observed in our simulations.Under very dilute conditions, we can thus assume that only monomers, dimers and monomer rings are present. Denoting the number density of monomer rings, linear monomers and dimers as $n_{1}^{r},n_{1}$ and $n_{2}$ , respectively, the Smoluchowski equations describing the system take the form


	$\displaystyle\dfrac{dn_{1}^{r}(t)}{dt}=k_{0}(1)n_{1}(t)$		(10a)
	$\displaystyle\dfrac{dn_{1}(t)}{dt}=-k_{1}(1,1)n_{1}^{2}(t)-k_{0}(1)n_{1}(t)$		(10b)
	$\displaystyle\dfrac{dn_{2}(t)}{dt}=\dfrac{1}{2}k_{1}(1,1)n_{1}^{2}(t)\,.$		(10c)

We solve Eq.(10b) neglecting the second term as $n_{1}^{2}\ll 1$ in the infinite dilution limit:

n_{1}(t)=n_{1}(0)e^{-k_{0}(1)t}

(11)

The concentration of monomer rings is thus

\dfrac{dn_{1}^{r}(t)}{dt}=k_{0}n_{1}(0)e^{-k_{0}(1)t}\,,

(12)

which yields

n_{1}^{r}(t)=n_{1}(0)[1-e^{-k_{0}(1)t}]\,.

(13)

Substituting in Eq.(10c), we get

\dfrac{dn_{2}(t)}{dt}=\dfrac{1}{2}k_{1}(1,1){n_{1}}^{2}(t)=\dfrac{1}{2}k_{1}(1%,1)\left[n_{1}(0)e^{-k_{0}(1)t}\right]^{2}\,,

(14)

from which one obtains

n_{2}(t)=\dfrac{k_{1}(1,1)}{4k_{0}(1)}{n_{1}}^{2}(0)\left[1-e^{-2k_{0}(1)t}%\right]\,.

(15)

Assuming these three are the only contributions to the system, the mean length is then given by the following relation

	$\displaystyle\langle l(t)\rangle$	$\displaystyle=\dfrac{l_{0}n_{1}(t)+l_{0}n_{1}^{r}(t)+2l_{0}n_{2}(t)}{n_{1}(t)+%n_{1}^{r}(t)+n_{2}(t)}=$
		$\displaystyle=l_{0}\dfrac{n_{1}(t)+n_{1}^{r}(t)+2n_{2}(t)}{n_{1}(t)+n_{1}^{r}(%t)+n_{2}(t)}\,.$		(16)

In denser solutions, where the population is more polydisperse, the Smoluchowski equation cannot be solved analytically and we refer to the next Section for a scaling prediction and to Sec. III.3.1 for a perturbative approach in the limit of small cyclisation rate.

As mentioned above, we validate de Gennes’ equation for the condensation rate (Eq.(8)) by running short simulations at very high dilution. We then fitted the change in number of ring monomers with the closed solutions Eqs.((13)) for different values of initial polymer length $l_{0}$ . Similarly, we fit the solution of Eq.(10b) without the ring term to the population of monomers. From these data, we validate the scaling of the rates $k_{0}(l_{0})=\kappa_{0}l_{0}^{-\mu}$ and $k_{1}(l_{0},l_{0})=\kappa_{1}l_{0}^{\alpha-\nu}$ as a function of length $l_{0}$ (Fig.2b-c).

III.1.2 Time-dependence of the mean length: concentrated regime

Here we give scaling arguments for the solution of the Smoluchowski equation in the concentrated limit, with the assumption that ring formation is negligible. At the mean-field level, we can make the simplifying assumption that the system can be described by a single characteristic length scale $l$ [37]. Under this assumption, the annealing rate scales as

k_{1}(l)\sim DR\sim l^{\nu-\alpha}\,.

(17)

The total polymer density $n$ thus follows $\dot{n}=-k_{1}(l)n^{2}$ , so that from the dimensional analysis the time evolution of the characteristic length is[38, 39]

l(t)\sim t^{1/(1+\alpha-\nu)}\sim t^{1/(1-\lambda)}\equiv t^{\gamma}\,,

(18)

with $\lambda=\nu-\alpha$ . For Rouse dynamics, one has $\alpha=1$ , whereas $\alpha=2$ for reptation[40]. The Flory exponent has value $\nu=1/2$ for ideal chains and $\nu=0.588$ for self-avoiding chains[40]. Assuming concentrations above overlap but still far from the melt concentration (for which one would have ideal chain statistics and $\alpha=1/2$ ), we can assume $\nu=0.588$ , so that $\gamma\simeq 0.7$ if the system is unentangled and $\gamma\simeq 0.4$ in the presence of entanglement. We note, however, that using Eq.(17) in the presence of entanglements is only valid for times longer than the reptation time $\tau_{R}\sim l^{3}$ [41].

Runaway Transition in Irreversible Polymer Condensation with cyclisation (3)

III.2 Linear DNA condensation

III.2.1 Simulations

We first simulate linear condensation using Molecular Dynamics. As detailed in the Methods section, we simulate polymers with $N=174$ beads of size $\sigma\sim 38$ bp and persistence length $l_{p}=4\sigma=150$ bp. These polymers are thus designed to coarse-grain 6.5 kb-long DNA plasmids which will be employed in experiments (see next section).During the simulation, we take snapshots of the system and record the list of bonds to reconstruct the topology of the polymers (see Fig.1). Over the simulation time, the number of initial linear chains decreases due to the formation of (i) longer linear polymers or (ii) circular chains (Fig.3a). Additionally, lower monomer concentrations $c$ promote the formation of more rings at large times and a slower decrease of the linear species. We also note that (i) the number fraction of rings converges to a finite value at large time, and that (ii) while the number of linear chains appears to go to zero, their mean length increases (Fig. 3b). Accordingly, the (number) average length of polymers grows more quickly for larger $c$ (Fig.3b). Thus, we conclude that loop formation competes with the growth of the chains, and that cyclisation is dominant in dilute systems. Interestingly, the curves of the mean length $\langle l(t)\rangle$ can be fitted extremely well by the numerical solution of the Smoluchowski equation Eq.(5) (Fig.3b).

III.2.2 Experiments

As described in the Methods Section, we can perform DNA condensation using solutions of linearised DNA plasmids, mixed with ATP and DNA T4 ligase. We then perform a time-resolved experiment, where we draw aliquots from a master reaction at given time points from the addition of the T4 ligase. By running the aliquots on agarose gels we can visualise and compute the fraction of molecules in the linear and ring, monomeric, dimeric, etc. states. Fig.4a reports a picture of one such gel, displaying a single band of monomeric linear DNA (as it disappears after exonuclease treatment) at $t=0$ , evolving into three bands, one of which is exonuclease resistant (a monomer ring) at larger times. In Fig.4b we plot the relative abundance of these populations, from which we obtain the number average molecular length $\langle l(t)\rangle$ (Fig.4c).

Runaway Transition in Irreversible Polymer Condensation with cyclisation (4)

III.2.3 Dimensionless topological parameter

Since we initialise our simulations and experiments below entanglement conditions we fix $\alpha=1$ as expected for Rouse dynamics and $\nu=0.588$ as expected for self-avoiding polymers[34] (we verified these exponents through direct MD simulations in Fig.2a-c). In general, the Smoluchowski coagulation equation (Eq.(5)) is then solved numerically to fit the data of mean length versus time, $\langle l(t)\rangle$ , obtained in simulations and experiments via the free parameters $\kappa_{1}$ and $\kappa_{0}$ . A key number in our system is the ratio of the rates at which polymers are condensed $\kappa_{1}$ , and the one at which rings are formed $\kappa_{0}$ . We thus define a dimensionless “topological parameter” $\kappa\equiv 2\kappa_{0}/(n_{0}\kappa_{1})$ , where $n_{0}$ is the number density of monomeric chains of length $l_{0}$ at the start of the simulation or experiment.

Albeit related to the classic j-factor employed in DNA looping[20, 6], our topological parameter is more naturally interpreted as the number of rings formed for every two linear chains that are fused together. Intuitively, this number determines the final topological composition of the system. At $\kappa\gg 1$ , we expect the final state of the system to be dominated by rings, while for $\kappa\ll 1$ to be dominated by linear chains. Importantly, since $k_{0}\sim\langle l(t)\rangle^{-4\nu}$ the probability of ring formation decreases in time as the average length of the linear chains increases. Accordingly, and even though our system has a ring-only irreversible absorbing state, we conjecture that the strongly decreasing looping probability may effectively yield a very long time-transient in which the system is dominated by entangled linear chains with circular contaminants (see below for more simulations on this).

Importantly, we expect the Smoluchowski equation to be valid only in the limit in which three-body interactions are negligible, the values of $\kappa_{0}$ and $\kappa_{1}$ should be independent on concentration only when $c$ is small enough. By plotting $\kappa\equiv 2\kappa_{0}/(n_{0}\kappa_{1})$ as a function of $c/c^{*}$ (where $c^{*}$ is computed at the beginning of the simulation or experiment) we show that $\kappa$ scales as $n_{0}^{-1}\sim(c/c^{*})^{-1}$ in both simulations and experiments until $c\simeq c^{*}$ where it starts to deviate (Fig.5); this confirms that the Smoluchowski approximation is valid in this range of concentrations. Importantly, in Fig.5 we also identify the crossover value $\kappa=1$ (at which the initial cyclisation rate is larger than the dimerisation rate) around $c/c^{*}\simeq 0.1-0.2$ . We note that the agreement between simulations and experiments is excellent for small $c/c^{*}$ . However, quantitative analysis of gel electrophoresis images at larger $c/c^{*}$ is challenging due to the poor separation of multimeric bands.

Runaway Transition in Irreversible Polymer Condensation with cyclisation (5)

Runaway Transition in Irreversible Polymer Condensation with cyclisation (6)

III.3 Runaway Transition

The results in Fig.3 suggest that at large $c/c^{*}$ the chains tend to grow longer, and cyclisation is suppressed; at the same time, the density of reactive ends and the speed of spatial exploration of the chains become smaller, thus suppressing dimerisation. Due to this kinetic competition, we ask whether the system can truly display a “runaway” phase, defined as a regime where at least one chain permanently escapes cyclisation and its length diverges in time. One way to address this question is to look at the number of chains that belong to the longest chain in the system, and how this quantity changes in time.By using a graph representation of our simulations (see Fig.6a-b) we can compute the fraction of chains (nodes) that belong to the giant connected component (GCC), i.e. the largest cluster of connected monomer chains (Fig.6b). In Fig.6a one can visually appreciate that at large reaction time, rings (blue) are abundant at low $c/c^{*}$ while linear chains (grey) are more abundant at large $c/c^{*}$ . These systems display a qualitatively different graph topology (Fig.6b). At small $c/c^{*}$ (large $\kappa$ ) the network of monomers is mostly disconnected; accordingly, even when the fraction of unreacted bonds goes to 0 at $t\to\infty$ , the average length of the polymers does not diverge. On the contrary, at larger $c/c^{*}$ we observe only a few rings and some very long chains that are connecting most of the nodes in the system. Overall, the graph appears much more connected and approaching percolation, i.e. where most of the nodes belong to the GCC, whose size grows with the size of the system (see Fig.6c).

III.3.1 Calculation of mass converted into rings at infinite time

Although our simulations support the notion that small values of $\kappa$ will result in linear chains of increasing lengths and vanishing cyclisation rate, they are fundamentally limited to finite-size systems where the cyclisation rate of the largest chain never rigorously goes to 0. To estimate the amount of mass that is converted into rings at long times, we do a perturbative calculation valid in the limit of small $\kappa$ . We start from the continuum Smoluchowski equation:

	$\displaystyle\dfrac{dn_{l}(t)}{dt}=\dfrac{1}{2}\int_{0}^{l}K(y,l-y)n_{y}(t)n_{%l-y}(t)dy+$
	$\displaystyle-\int_{0}^{\infty}K(y,l)n_{l}(t)n_{y}(t)dy-\dfrac{1}{2}\kappa l^{%\mu}n_{l}(t)\,,$		(19)

We define $K\equiv k_{1}/\kappa_{1}$ which is thus a scaling function such that $K(ai,al)\sim a^{\lambda}k_{1}(i,l)$ where $\lambda=\nu-\alpha$ [42]. We now treat $\kappa_{0}$ perturbatively, starting with $\kappa_{0}=0$ . In this case, there is no mass lost into rings and we can thus write a conservation law

\int_{0}^{\infty}ln_{l}(t)dl=1\quad\forall t\,.

(20)

Even for $\kappa$ non-zero, we assume the loss of mass to cyclisation remains finite and of order $\kappa$ . We will check the self-consistency of this assumption below. Using the mass conservation and Eq.(19) we can write the following scaling relations: $l^{2}n=1$ , $nt^{-1}=l^{1+\lambda}n^{2}$ . We therefore obtain:

l\sim t^{1/(1-\lambda)}\,.

(21)

which is the same as Eq.(18). Note that we must have $\lambda<1$ for the average length of polymers to increase over time. We can also write the density distribution as

n\sim t^{-2/(1-\lambda)}

(22)

which in the limit of long times or large lengths may be written as

n_{l}(t)\sim t^{-2/(1-\lambda)}\mathcal{G}\left(\dfrac{l}{t^{1/(1-\lambda)}}\right)

(23)

where $\mathcal{G}$ is a scaling function that only depends on the ratio $l/t^{1/(1-\lambda)}$ .

We now introduce the ring length distribution $n_{l}^{r}(t)$ and its evolution equation as

\dfrac{dn_{l}^{r}(t)}{dt}=2\kappa_{0}l^{\mu}n_{l}(t)\,.

(24)

Since at time $t=0$ there are no rings, we can then write

n_{l}^{r}(t\to\infty)=2\kappa_{0}l^{\mu}\int_{0}^{\infty}n_{l}(t)dt\,.

(25)

We can plug in the result we obtained for the distribution of length of linear chains Eq.(23) to yield

	$\displaystyle n_{l}^{r}(t\to\infty)$	$\displaystyle=2\kappa_{0}l^{\mu}\int_{0}^{\infty}t^{-2/(1-\lambda)}\mathcal{G}%\left(\dfrac{l}{t^{1/(1-\lambda)}}\right)dt$
		$\displaystyle=2\kappa_{0}l^{\mu}(1-\lambda)l^{-(1+\lambda)}\int_{0}^{\infty}x^%{-\lambda}\mathcal{G}(x)dx$		(26)

where we defined $x=l/t^{1/(1-\lambda)}$ . Thus, the number density of polymers that are converted into rings over infinite time is

n_{l}^{r\infty}=2\kappa_{0}(1-\lambda)l^{\mu-1-\lambda}\int_{0}^{\infty}x^{%\lambda}\mathcal{G}(x)dx\,.

(27)

Since $\lambda<1$ and assuming the $\mathcal{G}(x)=\mathcal{O}(1)$ when $x\to 0$ , the integral converges at 0. For convergence of this integral at $\infty$ we also require that the scaling function decays faster than $x^{\lambda-1}$ .

Assuming this functional form for the distribution of ring lengths at infinite time, we now compute the total average mass transformed into rings at infinite time as

	$\displaystyle M_{\text{rings}}^{\infty}$	$\displaystyle=\int_{1}^{\infty}ln_{l}^{r\infty}(t)dl$
		$\displaystyle=2\kappa_{0}(1-\lambda)\int_{0}^{\infty}x^{-\lambda}\mathcal{G}(x%)dx\int_{1}^{\infty}l^{\mu-\lambda}dl\,.$		(28)

The convergence of this integral requires that $\lambda-\mu>1$ and in this case we get

M_{\text{rings}}^{\infty}=2\kappa_{0}\dfrac{1-\lambda}{\lambda-\mu-1}\int_{0}^%{\infty}x^{-\lambda}\mathcal{G}(x)dx\,.

(29)

From this equation, we see that the fraction of mass in rings at infinite time $M_{\text{rings}}^{\infty}/M_{0}$ converges to a finite value proportional to $\kappa_{0}$ (and hence $<1$ at small $\kappa_{0}$ ).

With this calculation, we have thus shown that at small enough but non-zero $\kappa_{0}$ , the fraction of mass turning into rings is finite if $\lambda-\mu=\nu-\alpha+4\nu>1$ or $\nu>\alpha/5$ which is valid for any type of polymer in the non-entangled ( $\alpha=1$ ) regime.This implies that in this regime we expect the cyclisation probability to decay fast enough and cannot prevent the runaway of the $M_{0}-M_{\text{rings}}^{\infty}$ mass into linear chains that keep growing in time.

Consistently with this, in both MD and MC simulations, we never observe the formation of rings larger than 10 initial monomers. As shown here using asymptotic theory the mass fraction of linear polymers goes to a finite limit at $t\rightarrow\infty$ in a thermodynamic system. We find that the key condition to ensure the existence of runaway transition is that the cyclisation rate $k_{0}=\kappa_{0}l^{\mu}$ decays strongly enough. More specifically, we require the exponent $\mu$ to be $\mu=-4\nu<-4\alpha/5$ or $\nu>\alpha/5$ . This condition is always met in the Rouse unentangled ( $\alpha=1$ ) regime,provided that the polymers are not fully collapsed ( $\nu=1/3$ ). This argument establishes the existence of a runaway transition in the limit of large time and at large enough concentrations $c/c^{*}$ .

III.3.2 Direct Simulation Monte Carlo simulations of irreversible condensation

To formally address the existence of a true runaway transition in the thermodynamic limit, we compute the fraction of monomers belonging to linear species in systems of increasing size. To perform this calculation, we employ Direct Simulation Monte Carlo[23, 24, 25, 26] to solve the Smoluchowski equation in systems with up to $10^{5}$ chains. We run the DSMC code until it has reacted all ends apart from 2 and compute the average length of the linear population of chains, $\langle l_{lin}(t\gg 1)\rangle$ . As shown in Fig.6c, our MD simulations show that at $\kappa=1$ the GCC displays a change in scaling, growing as $GCC\sim\kappa^{-1}\sim c/c^{*}$ as $\kappa\to 0$ suggesting that a qualitative change in behaviour takes place around $\kappa\simeq 1$ . In Fig.6d, we also plot the number averaged chain length at an arbitrarily large time when the DSMC code has evolved the system as long as possible and has generated only a single linear chain. Fig.6d suggests that the linear-dominated regime ( $\kappa<1$ ) displays an average polymer length at a large reaction time that scales as $\langle l(t\gg 1)\rangle/l_{0}\sim\kappa^{-1}\sim c/c^{*}$ . Additionally, the fraction of mass “lost” in forming rings grows as $M_{rings}\sim\kappa$ and is thus negligible for small enough $\kappa$ (Fig.6e). Finally, as shown in Fig.6f, the mean length of the linear chains $\langle l_{lin}(t\gg 1)\rangle$ displays a plateau for $\kappa\lesssim 1$ , which grows with the system size, strongly indicating a true runaway transition at the critical value $\kappa\simeq 1$ or $c/c^{*}\simeq 0.1-0.2$ .

III.4 Dynamics and Rheology

To test the consequences of the runaway transition on the dynamics and rheology of the system, we perform microrheology experiments and compute dynamics in MD simulations. DNA microrheology is well established and the effects of DNA concentration, length and topology on microrheology have been studied in the past[43, 44, 45, 46, 47, 48, 49]. Here, we perform microrheology by tracking $800$ nm PVP-coated polystyrene beads added in a solution of DNA that has been treated with either 40U T4 ligase for a week (and thus to full extent of reaction) or with buffer for a week (control) at different initial concentrations. We ran a small aliquot of the samples in a gel and observed that indeed at $c/c^{*}\simeq 0.1$ the fraction of linear chains overcome the rings at large times (Fig.7a-b).

At low concentrations, our microrheology shows that the MSD of the tracer particles is unaffected by DNA ligation (Fig.7c). On the contrary, for $c/c^{*}\geq 0.1$ , we find that the MSDs of the tracers in the ligated systems are much slower and display a stronger subdiffusive behaviour than the control (Fig.7c). From the MSD, we extract the large-time diffusion coefficient $D$ of the tracers and the effective viscosity of the sample via the Stokes-Einstein equation[31]. The plot of the normalised viscosity (Fig.7d) suggests that a dynamical transition takes place around $c/c^{*}=0.1-0.2$ (or $\kappa\simeq 1-2$ ) which matches the structural runaway transition seen before (Fig.6). After the transition, the viscosity increases exponentially with the concentration (see inset of Fig.7d). This suggests a relaxation process dominated by end-retraction[34], possibly due to the threading of very long linear chains through small rings[50, 51, 52, 53, 54] or pseudo-knotted parts of their own extremely long contour[55, 56]. We note that, especially at large $c/c^{*}$ , the ligated solution is extremely elastic and the passive tracers do not display a freely diffusive behaviour even after a lag time of ten minutes. We thus argue that the reported $\eta/\eta_{0}$ may be lower bounds at large $c/c^{*}$ , which would render the transition even more dramatic. All this implies that, intriguingly, near the transition $c/c^{*}\simeq 0.1$ , systems prepared at similar concentrations may display extremely different rheology at large condensation times. To further support the existence of a qualitative change in the dynamics, we compute the values of viscosity obtained in MD simulations through the diffusion coefficient of the centre of mass of chains that have been ligated for long time at different initial concentrations (see red circles in Fig.7d). One can appreciate that our simulations also suggest a qualitative difference in dynamics for $c/c^{*}\geq 0.1$ , albeit the transition appears less dramatic than in experiments; we argue that this may be due to finite size effects present in MD simulations.

Runaway Transition in Irreversible Polymer Condensation with cyclisation (7)

IV Conclusion

We have studied a system of linear polymers undergoing irreversible condensation in competition with cyclisation. We have shown that the key adimensional parameter controlling growth kinetics is $\kappa=2\kappa_{0}/(n_{0}\kappa_{1})$ ; naturally interpreted as the number of rings formed for any one dimerisation. At large concentrations (or $\kappa<1$ ) dimerisation is kinetically favoured and drives the growth of linear chains. While growth disfavours cyclisation, it also reduces the number of available reactive ends and the annealing rate of the chains (see Eq.(8)), disfavouring further growth. Despite this, we discover that the net result of this kinetic competition is a runaway transition for $\kappa<1$ if the cyclisation rate decays strongly enough with polymer length, i.e. with $\nu>\alpha/5$ , with $\nu$ the metric exponent (typically 1/2 for random walks and 0.588 for self-avoiding walks) and $\alpha$ the dynamics exponent (typically 1 for Rouse and 2 for reptative dynamics). In these conditions, the fraction of monomers transformed into rings is finite, thus leaving the rest of the monomers available to form a permanently growing linear chain which then drive a runaway reaction.

We also discover that the runaway transition has deep consequences on the rheology, and triggers an exponential increase for $\kappa<1$ (or $c/c^{*}>0.1$ ).Our results suggest that it may be possible to tune the final topological composition of ligated systems by judiciously choosing $c/c^{*}$ . For instance, the most likely regime to form large rings and ring-linear blends[51, 52] is near the transition $c/c^{*}\simeq 0.1$ . Mixing polymer families with different reactive ends further enhances the designability as it introduces different $c^{*}$ for each family. Our results can be used to optimise the conditions for DNA engineering, e.g., transfection vectors[2] ought to be ligated at $c/c^{*}<0.1$ whereas synthetic chromosomes assemblies[57] at large $c/c^{*}$ . Finally, it may be possible to couple dissipative DNA breakage reactions[48, 58, 59] with ATP-consuming ligation to create dense solutions of self-sustained topologically active viscoelastic fluids which would be an interesting active fluid to investigate in the future.

Acknowledgements.

DM acknowledges the support of the Royal Society via a University Research Fellowship. This project has received support from European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 947918 to DM and No 677532 to ML). The authors acknowledge insightful discussions with Daan Noordermeer and Antonio Valdes who also kindly gifted us with the 1288 plasmid. Source codes are available at https://git.ecdf.ed.ac.uk/taplab/dna-ligation.git. For the purpose of open access, the author has applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising from this submission.

Appendix A Topology reconstruction algorithm

We will refer to the Topology Reconstruction code from now on as TR. The code takes as input the instantaneous trajectory and bond list from LAMMPS and checks for newly formed linear and ring chains. The output of the Python code is a file containing the number and length of linear chains that have formed in a given simulation time step. Similar files are produced for the ring chains. These files are then used to calculate the average length and the number of linear/ring chain figures.

The starting point of the TR algorithm is an array $\mathbf{b}$ of size $N_{b}\times 2$ ; each row $b_{i}=(id_{1},id_{2})$ represents the IDs of atoms that are bonded within the system:

\mathbf{b}=\begin{bmatrix}id_{1}&id_{2}\\id_{3}&id_{4}\\\vdots&\vdots\\id_{N_{n-1}}&id_{N_{n}}\end{bmatrix}\,,

Since not all particles are linked together, some do not appear in the array $\mathbf{b}$ . To avoid operations with large sparse arrays, the matrix $\mathbf{b}$ is mapped to $\mathbf{\tilde{b}}$ that contains only indexes from $1$ to the maximum number of atoms connected $M:\mathbf{b}\to\mathbf{\tilde{b}}$ .

The next step of the TR algorithm is to create a connectivity matrix $\mathbf{C}$ based on the list $\mathbf{\tilde{b}}$ . Each row of $\mathbf{C}$ represents an atom index and consists of three components $\mathbf{C}(id_{i},:)=(id_{i-1},id_{i+1},flag)$ . Since in our case a particle can be linked with two more particles the first two components of each row $id_{i-1},id_{i+1}$ represent the connections of particle $id_{i}$ (note that $id_{i-1}$ and $id_{i+1}$ are not necessarily consecutive in 1D but can be any other particle bonded to particle $i$ ). The third component, $flag$ , takes only the values ${0,1}$ and accounts for the particles $id_{i}$ that already belong to a polymer chain. The flag column of $\mathbf{C}$ is initialised to zeros. During the reading process of the connectivity matrix, the algorithm switches the flags to 1 of the particles that are already considered to belong in a chain. Rings are extracted in the same manner and a ring is found if the current atom index is the same as the starting atom index. This reading process outputs $N_{c}$ arrays that have different lengths and each of them contains the particle (mapped) ids that are connected in a polymer chain. The final step of the TR algorithm is to map the atom indexed back to the original ones: $M^{-1}:\mathbf{\tilde{b}}\to\mathbf{b}$ . This algorithm is very generic and can be applied also in cases where the atoms are not initially in polymers as in our case, but rather individual atoms that can connect during the simulation.

Appendix B Description of the Monte Carlo step

In this paragraph, we will describe the single Monte Carlo (MC) step, which is repeated a large number of times during the numerical resolution of Eq.(5) performed using the DSMC algorithm. With reference to Eq.(5), we define $n_{f}\equiv V\sum_{i=1}^{\infty}n_{i}$ (total number of chains). Before the start of the simulation, we give an estimate of the maximum annealing rate $k_{\text{max}}$ and of the maximum cyclisation rate $k_{0}^{\text{max}}$ . The exactness of the algorithm does not depend on this initial choice, however, choosing values that are too far from the actual maximum rates can lead to a reduced efficiency[23].

During every MC step, we either attempt to perform a ligation reaction (with probability $p$ ) or a cyclisation one (with probability $1-p$ ). The value of $p$ is calculated initially and then updated during the simulation in such a way that the average number densities $n(l)$ satisfy (5). At the beginning of each MC step, $p$ is evaluated as

p^{-1}=1+\frac{2Nk_{o}^{\text{max}}}{(n_{f}-1)nk_{\text{max}}}\,.

(30)

We will show below that this choice also guarantees that the simulation samples the correct number of cyclisation and ligation events per unit volume and unit time as required by Eq.(5).

We define a waiting time variable that is set to zero at the beginning of the simulation. After each reaction, a waiting time increment is generated. These increments are also chosen to guarantee the correct number of ligation and fragmentation reactions per unit of time/volume, as detailed below. We can now describe the MC step, during which the following actions are performed:

1.
We evaluate the probability of annealing $p$ according to Eq.(30). The explicit form of $p$ , Eq.(30), will be discussed in detail below.
2.
We pick a random number $0\leq r\leq 1$ from a uniform distribution. If $r\leq p$ , we attempt a ligation event:
1. (a)
  We pick a pair of elements of the array $\mathbf{m}$ , denoted $\alpha,\beta$ at random. Since there are $n_{f}(n_{f}-1)$ ordered pairs of chains, the probability of picking a specific pair is $[n_{f}(n_{f}-1)]^{-1}$ . Let the length associated with these elements be $m_{\alpha}=i$ and $m_{\beta}=j$ .
2. (b)
  We evaluate the ligation rate $k_{1}(i,j)$ for the two chains. If $k_{1}(i,j)>k_{\text{max}}$ , we set $k_{1}^{\text{max}}=k_{1}(i,j)$ and return to (1). Otherwise, we continue.
3. (c)
  We pick another random number $r^{\prime}$ , and perform the ligation if $r^{\prime}\leq k_{1}(i,j)/k_{1}^{\text{max}}$ . If ligation is unsuccessful, we return to (1). Otherwise, we continue.
4. (d)
  We increment the waiting time by $\Delta t^{\text{lig}}_{i,j}=\frac{2AN}{n_{f}(n_{f}-1)nk_{ij}}$ . Here $A$ is a parameter, the only condition on which is that it must be between $0$ and $1$ , as we will discuss in more detail below.
5. (e)
  After incrementing the waiting time, we update $\mathbf{f}$ by setting $m_{\alpha}=0$ and $m_{\beta}=i+j$ .
3.
If $r>p$ , we attempt a cyclisation event:
1. (a)
  We pick a chain $\gamma$ at random with probability $n_{f}^{-1}$ . Let $m_{\gamma}=l$ .
2. (b)
  We evaluate the cyclisation rate $k_{0}(l)$ . If $k_{0}(l)>k_{0}^{\text{max}}$ , set $k_{0}^{\text{max}}=k_{0}(l)$ and return to (1). Otherwise, we continue.
3. (c)
  We extract another random number $0\leq r^{\prime}\leq 1$ from a uniform distribution, and perform cyclisation if $r^{\prime}\leq k_{0}(l)/k_{0}^{\text{max}}$ . If cyclisation is unsuccessful, we return to (1). Otherwise, we continue.
4. (d)
  We increment the waiting time by $\Delta t^{\text{cyc}}_{l}=\frac{1-A}{n_{f}k_{o}(l)}$ , with $A$ defined above in step (2).
5. (e)
  We record the value of $l$ in $\mathbf{r}$ and set $m_{\gamma}=0$ .

We now prove that the definitions of $p$ (Eq.(30)), the waiting time increments $\Delta t^{\text{lig}}_{i,j}$ (for ligation) and $\Delta t^{\text{cyc}}_{i,k-i}$ (for cyclisation) give several ligation and cyclisation events per unit time which is consistent with the Smoluchowski equation Eq.(5). Over a single MC step, the mean number of ligation events involving the ordered pair of filaments $(\alpha,\beta)$ is

\langle\#L_{\alpha,\beta}\rangle\equiv\frac{p}{n_{f}(n_{f}-1)}\frac{k_{1}(m_{%\alpha},m_{\beta})}{k_{1}^{\text{max}}}\,.

(31)

We note that in the algorithm we consider $(m_{\alpha},m_{\beta})$ as an ordered pair, and thus in (31) we consider the reaction $(i,j)\to l$ as distinct from $(j,i)\to l$ . The mean number of ligation events involving any two chains with lengths $i,j$ can be obtained by multiplying the above quantity by $2(1-\delta_{ij}/2)V^{2}n_{i}n_{j}$ . The factor $2(1-\delta_{ij}/2)$ takes into account the fact that, as mentioned above, for $i\neq j$ , there are two ways to perform the ligation, whereas for $i=j$ there is only one. The factor $V^{2}n_{i}n_{j}$ is the product of the volume fractions of filaments of lengths $i$ and $j$ . We thus have

		$\displaystyle 2V^{2}n_{i}n_{j}\left(1-\frac{\delta_{ij}}{2}\right)\times\frac{%p}{n_{f}(n_{f}-1)}\frac{k_{1}(i,j)}{k_{1}^{\text{max}}}=$
		$\displaystyle Vk_{1}(i,j)n_{i}n_{j}\left(1-\frac{\delta_{ij}}{2}\right)\Delta t\,,$		(32)

where we have equated the mean number of ligation events involving any two chains with lengths $i,j$ to the value required by the Smoluchowski equation. Recalling that $n=N/V$ , we thus find

\Delta t=\frac{2pN}{n_{f}(n_{f}-1)nk_{1}^{\text{max}}}\,.

(33)

Eq.(33) relates the time interval $\Delta t$ to the probability of ligation. We will now obtain a second equality involving $p$ and $\Delta t$ , which will allow us to prove that the expression Eq.(30) for $p$ guarantees the correct number of ligation and cyclisation events per unit time.

The mean number of cyclisation events involving chains $\gamma$ is

\langle\#C_{\gamma}\rangle\equiv\frac{(1-p)k_{0}(m_{\gamma})}{n_{f}k_{0}^{%\text{max}}}\,.

(34)

To obtain the mean number of cyclisations of a generic $l-$ mer we need to multiply this quantity by $Vn_{l}$ , i.e., the volume fraction of filaments of length $l$ . Equating this quantity to the expected number of rings formed in a time interval $\Delta t$ we obtain

Vn_{l}\times\frac{(1-p)k_{0}(l)}{n_{f}k_{0}^{\text{max}}}=k_{0}(l)n_{l}V\Deltat\,,

(35)

and hence

\Delta t\equiv\frac{1-p}{n_{f}k_{o}^{\text{max}}}\,.

(36)

By equating the two expressions for $\Delta t$ , Eq.(33) and Eq.(36), we find Eq.(30). We have thus proven that the latter is the correct expression of $p$ , which gives the correct number of cyclisation and ligation events per unit time and unit volume, as required by the Smoluchowski equation.

Finally, we will prove below that the constants $A$ and $1-A$ introduced when calculating the waiting time increments are consistent with Eq.(33) and Eq.(36). To show this, it is sufficient to observe that the total time increment during an MC step is:

	$\displaystyle\Delta t$	$\displaystyle=\sum_{0\leq\alpha<\beta\leq n_{f}-1}\langle\#L_{\alpha,\beta}%\rangle\Delta t^{\text{lig}}_{m_{\alpha},m_{\beta}}+\sum_{i=1}^{m_{\gamma}-1}%\langle\#C_{\gamma}\rangle\Delta t^{\text{cyc}}_{m_{\gamma}}$
		$\displaystyle=\sum_{0\leq\alpha<\beta\leq n_{f}-1}\left[\frac{pk_{m_{\alpha},m%_{\beta}}}{n_{f}(n_{f}-1)k_{\text{max}}}\right]\left[\frac{2AN}{n_{f}(n_{f}-1)%nk_{m_{\alpha},m_{\beta}}}\right]$
		$\displaystyle\ \ \ +\sum_{i=1}^{m_{\gamma}-1}\left[\frac{(1-p)k_{o}(l)}{nk_{o}%^{\text{max}}}\right]\frac{1-A}{n_{f}k_{o}(l)}$
		$\displaystyle=\frac{2ApN}{n_{f}(n_{f}-1)nk_{\text{max}}}+\frac{(1-A)(1-p)}{n_{%f}k_{o}^{\text{max}}}$

One can see that this equality is consistent with Eq.(33) and Eq.(36). We note that the algorithm samples on average the correct kinetics independently of the value of $A$ , as long as $0\leq A\leq 1$ . Here we take $A=1$ , meaning that the waiting time increment is calculated only after a successful ligation reaction, but not after a successful cyclisation reaction.

Appendix C Numerical integration of modified Smoluchowski

Solving the Smoluchowski equation to fit the data from MD simulations consists of two main parts:

We create an objective function for the lsqcurvefit (called Obj_smoluchowski) that takes as input the array of initial coefficient guess $K_{0}=(\kappa_{1},\kappa_{0})$ and the time data array $xdata$ . It returns the average length as a function of time, array $ydata$ . In the objective function:

(a)
An array $\mathbf{L}=\mathbf{n}\cdot l_{0}$ is initialised where $n=\{1,2,\dots,N_{c}=200\}$ and $l_{0}=174$ . This represents the set of lengths that can be found in the system (recall that we initialise our MD simulations with 200 chains of 174 beads each). Also, the arrays with the number density of linear and ring chains are initialised as follows, $\mathbf{n_{l_{0}}}=(N_{c}/vol,0,\dots,0)_{1\times N_{c}}$ and $\mathbf{n_{r_{0}}}=(0,\dots,0)_{1\times N_{c}}$ since initially all the molecules are linear chains. Here, $vol$ denotes the volume of the simulation box.
(b)
for $t$ = {1 to simulation final step time} docall $(\mathbf{n_{L_{new}}},\mathbf{n_{R_{new}}})=exEuler\_smoluchowski\left(\mathbf%{n_{L}},\mathbf{n_{R}},K\right)$ function (see point 2 below)

(c)

update arrays $\mathbf{n_{L}}=\mathbf{n_{L_{new}}}$ and $\mathbf{n_{R}}=\mathbf{n_{R_{new}}}$ . Calculate the total average length $\mathbf{l_{total}}$ as

\mathbf{l_{total}(t)}=\frac{\mathbf{n_{L_{new}}}\cdot\mathbf{L}+\mathbf{n_{R_{%new}}}\cdot\mathbf{L}}{\sum_{i}^{N_{m}}{n_{L_{new}}^{i}}+\sum_{i}^{N_{m}}{n_{R%_{new}}^{i}}}

(d)
exit for loop and parse $\mathbf{l_{total}(t)}$ to $ydata$

2.
The exEuler_smoluchowski function takes as input the initial number densities of linear and ring chains and the reaction rates $\mathbf{n_{L}},\mathbf{n_{R}},K=(\kappa_{1},\kappa_{0})$ . Based on the given rates K, it outputs the final number density arrays $\mathbf{n_{L_{new}}},\mathbf{n_{R_{new}}}$ , after the reactions have taken place. When this function is called, the number density of linear and ring chains of each population are updated according to Eq.(5). The monomer, dimer, and so on populations are increased according to the first two terms of Eq.(5) while the number of them that is converted into rings is subtracted by the $\mathbf{n_{L_{new}}}$ and added to the $\mathbf{n_{R_{new}}}$ array.
In the first two terms of Eq.(5a) the rate $k_{1}(i,j)$ is not a scalar quantity by rather a matrix that follows the relation Eq.(8). The extracted coefficient against which the fitting is optimised is the scalar $\kappa_{1}$ . Similarly, for the sink term of Eq.(5a)-(5b), the equation $k_{0}(l)=\kappa_{0}l^{-4\nu}$ is used and the fitting coefficient exported is the scalar $\kappa_{0}$ .

The coefficients $K$ are updated iteratively by the lsqcurvefit algorithm to best fit the data. Once the optimum values are obtained the algorithm terminates.

References

RubinsteinandColby [2003]M.RubinsteinandH.R.Colby,Polymer Physics(Oxford University Press,2003).
OliynykandChurch [2022]R.T.OliynykandG.M.Church,Communications biology5 (2022).
Albertsetal. [2014]B.Alberts, A.Johnson,J.Lewis, D.Morgan,andM.Raff,Molecular Biology of the Cell(Taylor& Francis,2014)p.1464.
Flory [1936]P.J.Flory,Journalof the American Chemical Society58,1877 (1936).
CatesandCandau [2001]M.E.CatesandS.J.Candau,EPL55,887 (2001).
Vafabakhsh Reza, Ha [2012]T.Vafabakhsh Reza, Ha,Science337,1097 (2012).
Zhouetal. [2012]H.Zhou, J.Woo, A.M.co*k, M.Wang, B.D.Olsen,andJ.A.Johnson,Proc. Natl. Acad. Sci USA109,19119 (2012).
JacobsonandStockmayer [1950]H.JacobsonandW.H.Stockmayer,The Journal of Chemical Physics18,1600 (1950).
FloryandSemlyen [1966]P.J.FloryandJ.A.Semlyen,Journal of the American Chemical Society88,3209 (1966).
SuematsuandOkamoto [1992]K.SuematsuandT.Okamoto,Colloid & Polymer Science270,421 (1992).
ChenandDormidontova [2004]C.C.ChenandE.E.Dormidontova,Macromolecules37,3905 (2004).
ErcolaniandStefano [2008]G.ErcolaniandD.Stefano,Journal of Physical Chemistry B112,4662 (2008).
Madeleine-Perdrillatetal. [2014]C.Madeleine-Perdrillat, F.Delor-Jestin,andP.De Sainte Claire,Journal of Physical Chemistry B118,330 (2014).
Di StefanoandMandolini [2019]S.DiStefanoandL.Mandolini,Physical Chemistry Chemical Physics21,955 (2019).
Kricheldorfetal. [2020]H.R.Kricheldorf, S.M.Weidner,andF.Scheliga,Polymer Chemistry11,2595 (2020).
LangandKumar [2021]M.LangandK.S.Kumar,Macromolecules54,7021(2021).
Kricheldorfetal. [2022]H.R.Kricheldorf, S.M.Weidner,andJ.Falkenhagen,Polymer Chemistry13,1177 (2022).
Smoluchowski [1918]M.v.Smoluchowski,Zeitschrift für physikalische Chemie92,129 (1918).
Ziff [1980]R.M.Ziff,Journalof Statistical Physics23,241 (1980).
ShimadaandYamakawa [1984]J.ShimadaandH.Yamakawa,Macromolecules17,689(1984).
KremerandGrest [1990]K.KremerandG.S.Grest,TheJournal of Chemical Physics92,5057 (1990).
Plimpton [1995]S.Plimpton,J.Comp. Phys.117,1(1995).
Garciaetal. [1987]A.L.Garcia, C.VanDenBroeck, M.Aertsens,andR.Serneels,Physica A143,535(1987).
Liffman [1992]K.Liffman,J.Comput. Phys.100,116(1992).
Kruisetal. [2000]F.E.Kruis, A.Maisels,andH.Fissan,AIChE J.46,1735 (2000).
Tranetal. [2023]Q.D.Tran, V.Sorichetti,G.Pehau-Arnaudet,M.Lenz,andC.Leduc,Physical Review X13,011014 (2023).
Robertsonetal. [2006]R.M.Robertson, S.Laib,andD.E.Smith,Proc. Natl. Acad. Sci. USA103,7310 (2006).
TaylorandHagerman [1990]W.H.TaylorandP.J.Hagerman,Journal of Molecular Biology212,363 (1990).
BatesandMaxwell [2005]A.BatesandA.Maxwell,DNA topology(OxfordUniversity Press,2005).
Crockeretal. [2000]J.C.Crocker, M.T.Valentine, E.R.Weeks, T.Gisler,P.D.Kaplan, A.G.Yodh,andD.A.Weitz,Phys. Rev. Lett.85,888 (2000).
HansenandMcDonald [2013]J.-P.HansenandI.R.McDonald,Theory of simpleliquids: with applications to soft matter(Academic press,2013).
De Gennes [1982a]P.G.De Gennes,The Journal of Chemical Physics76,3316 (1982a).
Grosbergetal. [1982]A.Y.Grosberg, P.G.Khalatur,andA.R.Khokhlov,DieMakromolekulare Chemie, Rapid Communications3,709 (1982).
DoiandEdwards [1988]M.DoiandS.Edwards,The theory of polymer dynamics(Oxford University Press,1988).
Rosaetal. [2010]A.Rosa, N.B.Becker,andR.Everaers,Biophys. J.98,2410 (2010).
RosaandEveraers [2008]A.RosaandR.Everaers,PLoS computational biology4,1 (2008).
V. SorichettiandLenz [2023]V.SorichettiandM.Lenz,Phys. Rev. Lett.131,228401 (2023).
van DongenandErnst [1984]P.G.J.van DongenandM.H.Ernst,J. Stat. Phys.37,301 (1984).
VanDongenandErnst [1985]P.VanDongenandM.Ernst,Physical review letters54,1396 (1985).
@doiandEdwards [1988]M.@doiandS.Edwards,The Theory of Polymer Dynamics(1988).
De Gennes [1982b]P.G.De Gennes,The Journal of Chemical Physics76,3322 (1982b).
MeakinandErnst [1988]P.MeakinandM.H.Ernst,Phys.Rev. Lett.60,2503(1988).
Masonetal. [1997]T.G.Mason, K.Ganesan,J.H.Van Zanten,D.Wirtz,andS.C.Kuo,Physical Review Letters79,3282 (1997).
Zhuetal. [2008]X.Zhu, B.Kundukad,andJ.R.Van Der Maarel,J. Chem. Phys.129,1 (2008).
Krajinaetal. [2017]B.A.Krajina, C.Tropini,A.Zhu, P.Digiacomo, J.L.Sonnenburg, S.C.Heilshorn,andA.J.Spakowitz,ACS Central Science3,1294 (2017).
TanoguchiandMurayama [2018]M.TanoguchiandY.Murayama,AIPAdvances8 (2018).
Smreketal. [2021]J.Smrek, J.Garamella,R.Robertson-Anderson,andD.Michieletto,Science Advances7,1 (2021).
Michielettoetal. [2022]D.Michieletto, P.Neill,S.Weir, D.Evans, N.Crist, V.A.Martinez,andR.M.Robertson-Anderson,Nature Communications13 (2022).
Fosadoetal. [2023]Y.A.Fosado, J.Howard,S.Weir, A.Noy, M.C.Leake,andD.Michieletto,Physical Review Letters130,58203 (2023).
Roovers [1988]J.Roovers,Macromolecules21,1517 (1988).
Kapnistosetal. [2008]M.Kapnistos, M.Lang,D.Vlassopoulos, W.Pyckhout-Hintzen, D.Richter, D.Cho, T.Chang,andM.Rubinstein,Nature materials7,997 (2008).
Halversonetal. [2012]J.D.Halverson, G.S.Grest, A.Y.Grosberg,andK.Kremer,Phys. Rev. Lett.108,038301 (2012).
Zhouetal. [2021]Y.Zhou, C.D.Young,M.Lee, S.Banik, D.Kong, G.B.McKenna, R.M.Robertson-Anderson, C.E.Sing,andC.M.Schroeder,Journal of Rheology65,729 (2021).
Parisietal. [2020]D.Parisi, J.Ahn,T.Chang, D.Vlassopoulos,andM.Rubinstein,Macromolecules53,1685 (2020).
Michielettoetal. [2014]D.Michieletto, D.Marenduzzo, E.Orlandini, G.P.Alexander,andM.S.Turner,Soft Matter10,5936 (2014).
Sohetal. [2019]B.W.Soh, A.R.Klotz,R.M.Robertson-Anderson,andP.S.Doyle,Physical Review Letters123,1 (2019).
Annaluruetal. [2014]N.Annaluru, H.Muller,L.A.Mitchell, S.Ramalingam, G.Stracquadanio, S.M.Richardson, J.S.Dymond, Z.Kuang, L.Z.Scheifele, E.M.Cooper, Y.Cai, K.Zeller,N.Agmon,andJ.S.Han,Science (New York, N.Y.)344,55 (2014),arXiv:24674868.
Del Grossoetal. [2022]E.DelGrosso, E.Franco,L.J.Prins,andF.Ricci,Nature Chemistry14,600 (2022).
HeinenandWalther [2019]L.HeinenandA.Walther,Science Advances5,32(2019).