††thanks: joint first author††thanks: joint first author††thanks: corresponding author, davide.michieletto@ed.ac.uk
Maria PanoukidouSchool of Physics and Astronomy, University of Edinburgh, Peter Guthrie Tait Road, Edinburgh, EH9 3FD, UKSimon WeirSchool of Physics and Astronomy, University of Edinburgh, Peter Guthrie Tait Road, Edinburgh, EH9 3FD, UKValerio SorichettiInstitute of Science and Technology Austria, 3400 Klosterneuburg, AustriaLaboratoire de Physique Théorique et Modèles Statistiques (LPTMS), CNRS, Université Paris-Saclay, F-91405 Orsay, FranceYair Gutierrez FosadoSchool of Physics and Astronomy, University of Edinburgh, Peter Guthrie Tait Road, Edinburgh, EH9 3FD, UKMartin LenzLaboratoire de Physique Théorique et Modèles Statistiques (LPTMS), CNRS, Université Paris-Saclay, F-91405 Orsay, FrancePMMH, CNRS, ESPCI Paris, PSL University, Sorbonne Université,Université de Paris, F-75005, Paris, FranceDavide MichielettoSchool of Physics and Astronomy, University of Edinburgh, Peter Guthrie Tait Road, Edinburgh, EH9 3FD, UKMRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK
Abstract
The process of polymer condensation, i.e. the formation of bonds between reactive end-groups, is ubiquitous in both industry and biology. Here we study generic systems undergoing polymer condensation in competition with cyclisation. Using a generalised Smoluchowski theory, molecular dynamics simulations and experiments with DNA and ATP-consuming T4 ligase, we find that this system displays a transition, from a ring-dominated regime with finite-length chains at an infinite time to a linear-polymers-dominated one with chains that keep growing in time. Finally, we show that fluids prepared close to the transition may have widely different compositions and rheology at large condensation times.
I Introduction
Linear polymer condensation is the process by which two polymeric end groups react to form a bond. Beyond its relevance to industry[1], and biotechnology[2], it underpins the biophysics of DNA repair and cloning[3]. In the absence of loop formation, polymer condensation will yield linear chains with average length where is the extent of the condensation reaction[1, 4]. However, looping, or cyclisation, is expected to be favourable in certain conditions[5, 6, 7]. Several theories on reversible polymer condensation and experiments have, over the last decades, attempted to reach a consensus on whether the polymers in such systems will all eventually convert into rings or whether there always be a linear population at a large-time scale [8, 9, 10, 11, 12, 13, 14, 15]. Despite this, the polymer physics and chemistry communities have not yet reached a consensus [15, 16, 17]. Additionally, there is little literature on irreversible polymer condensation, which we also refer to as “ligation” henceforth in analogy with the biological process of connecting DNA segments by the enzyme ligase.
Here we study irreversible linear polymer condensation using a combination of theory, simulations, and experiments. First, we show that irreversible polymer condensation is well captured by a modified Smoluchowski coagulation equation[18, 19] with an additional sink term that captures ring formation. By spanning a range of monomer concentrations , we discover that above a critical there is a “runaway” transition characterised by a population of chains that permanently escape cyclisation. Here ) denotes the overlap concentration of polymers with and the initial polymer length and radius of gyration respectively. This transition separates a regime in which all the chains are converted into rings at infinite time, from one () in which the length of the linear chains diverges in time. The consequence of this runaway transition is that systems prepared close to and driven out-of-equilibrium by irreversible condensation will display markedly different architectural and rheological features at large enough times.
Our work differs from classic and also more recent papers on polymer condensation and cyclisation[8, 11, 20, 11, 16] because it deals with irreversible condensation while implementing subdiffusive search and cyclisation in a Smoluchowski framework and because it suggests through theory, simulations and experiments, that a runaway transition is expected beyond a critical concentration.We also argue that DNA is particularly suitable to test these theories as we can readily visualise the products of ligation reactions by gel electrophoresis and distinguish linear and circular forms by treating the samples with exonuclease, as described below.We conclude our paper by discussing the implications of our findings in the design of soft materials and DNA cloning.
II Methods
II.1 Molecular Dynamics Simulations
We model a 6,500 bp-long linear DNA molecule as a bead-spring polymer made of beads. The total number of polymer chains is . The polymers are modelled via the Kremer-Grest model[21]. Each bead has a diameter nm (or bp), modelled as a truncated and shifted Lennard-Jones potential (WCA)
(1)
for and 0 otherwise. Here represents the distance between beads and (in LJ units) parametrises the strength of the potential. The diameter of the bead, , defines the length units in our system. Consecutive beads are connected through a permanent Finite Extensible Non-linear Elastic (FENE) bond
(2)
with and , which is summed to a WLC potential to yield an equilibrium bond length around . The bending stiffness of the polymer is controlled by a Kratky–Porod interaction
(3)
which constrains the angle () defined by the two tangent vectors connecting three consecutive beads along the polymer. Here, bp is the persistence length of DNA. We note that as , we are always in the flexible chain regime. The solvent is simulated implicitly using a Langevin thermostat so that the time evolution of our system is governed by the stochastic partial differential equations
(4)
where is the position of a particle, its friction, its mass, the sum of the interaction potentials discussed above and white noise with unit variance. The diffusion timescale is . The integration of the Langevin equation is done with a velocity-Verlet algorithm, using a time step in LAMMPS[22].
Various monomer densities were considered, ranging from to , where is the monomer concentration at which the polymers start to overlap. The overlap concentration was measured by computing the radius of gyration of the polymers in equilibrium at infinite dilution. All the systems were equilibrated for a sufficient amount of time to ensure that the polymer chains have moved at least a distance equal to .
After the equilibration step, 40 replicas of production runs were started for each number density considered. The ligation is performed stochastically and is attempted every between two end beads that are closer than using the fix bond/create LAMMPS command. The choice of the time in between ligation attempts, , was made so that it was much shorter than the relaxation time of the chains; in this way, the condensation process is diffusion-limited. The distance threshold was chosen so that the new bond created is a FENE with cutoff 1.5 and to avoid unstable simulations. The probability of successful ligation (i.e., bond formation) is set to . This value was chosen to avoid “granularity” in the stochastic condensation reaction. If this parameter was set to 1, all the ends that can react would do so in a single time step introducing granular events in our simulations. Setting introduces some randomness that simply maps to a smaller average condensation rate. We have tested slightly different choices of these parameters and we found that the main results and qualitative behaviour of our results are not affected.In particular, we have tested that the reactions remain diffusion-limited even with our choice of . A schematic representation of the simulation process is shown in Fig. 1.
Once ligated, the bond formed between the polymers is irreversible and cannot be broken, therefore accounting for the formation of a covalent bond between the DNA fragments. During the ligation process snapshots of the system are taken every time steps on both the 3D coordinates of the beads and the bond list at those time steps. From the bond list we can, later on, reconstruct the topology of the individual polymers, i.e. if fused with others to form linear chains or if circularised.
For the topology reconstruction, the trajectories and bond lists were analysed using our Python code (https://git.ecdf.ed.ac.uk/taplab/dna-ligation.git). The description of the algorithm can be found in Appendix A.
II.2 The DSMC algorithm
The modified Smoluchowski equation proposed here, (see below Eq.(5)), can only be solved analytically for certain forms of the condensation rate and of the cyclisation rate . As our Molecular Dynamics simulations are practically limited to systems of hundreds of chains, to characterize the behaviour of larger systems we solve the Smoluchowski equation numerically employing the Direct Simulation Monte Carlo (DSMC) algorithm[23, 24, 25, 26]. DSMC is a powerful stochastic method to solve differential equations such as Eq.(5), and which samples the correct ligation kinetics in the limit of large system sizes. The algorithm employed here is similar to the one described in Ref.[26], with the difference that here we do not include fragmentation, but instead, we include ring formation. The description of the algorithm also follows Ref.[26]. The starting point for the Monte Carlo algorithm is an array of length , each element of which contains a number which represents the mass/length of the chain :
A value of corresponds to the absence of a certain chain. Moreover, to satisfy mass conservation we ensure that is true at any time during the simulation. Here, denotes the total number of polymer chains. We will also consider an analogous array of length (initially empty), where we save the masses of the rings.
For an initial monodisperse condition, we set . After the array m is initialized, we run the DSMC simulation, which consists of repeating a large number of times a Monte Carlo step (described in detail in Appendix B). The execution is terminated when the system has reached a state in which there is a single linear chain and several non-reactive rings, where the only possible reaction is the cyclisation of the remaining linear chain.
II.3 Experiments
II.3.1 Ligation Reactions with DNA
We perform irreversible condensation on linear DNA using T4 ligase New England Biolabs (NEB). This enzyme consumes ATP to form a covalent bond between two proximal and complementary double-stranded DNA ends. More specifically, we perform irreversible condensation on a monodisperse solution of linear, bp-long plasmid (referred to as “1288” plasmid here) which is converted into a linear form by using a restriction enzyme (XhoI). This linearisation step is checked on gel electrophoresis. The equilibrium radius of gyration of this linear DNA molecule is about m (in agreement with diffusion data from Ref.[27]). This yields an overlap concentration g/l with g/mol the molecular weight of a DNA basepair and the Avogadro number. For the low DNA concentration experiments we set the sample at , i.e. g/ml. To perform ligation we use T4 ligase (NEB, M0202L, 1U corresponds to 0.5 ng or 0.00735 pmoles of protein according to Ref.[28]), and work at 1x T4 ligase reaction buffer concentration, which contains 1 mM ATP. To classify the topology of the DNA under ligation, we perform time-resolved gel electrophoresis. We prepare a master solution of DNA at the desired concentration, 1x ligase buffer and 2 U/l T4 ligase.
After adding T4 ligase, we draw aliquots at time intervals and heat-inactivate the reaction by heating the aliquot at 65∘C for 15 minutes. We then split the aliquot and treat one of the two sub-aliquots using exonuclease (RecBCD, Lucigen), an enzyme that digests linear, but not circular, DNA. Finally, we treat all aliquots with Nb.BbvCI Nickase (NEB, R0631L) to relax the supercoiled population [29]. The resulting aliquots are run on a gel: we load 20ng of DNA from each aliquot onto a 1% agarose gel prepared using 1x TAE buffer. A standard DNA - HindIII digest (NEB, N3012S) marker is also loaded. The gel is run at 2.5V/cm for 5 hours and post-stained with SybrGold (ThermoFisher) for 30 minutes. A Syngene G-box and Genesys software is used to image the gels.
The combination of nickase (relaxing the DNA supercoiling) and exonuclease (fully digesting linear DNA molecules) allowed the topology of the DNA in each band to be unambiguously identified. Further, the DNA - HindIII digest marker confirmed the bands were of the correct size for monomer and dimer lengths. Here the terms “monomer” and “dimer” refer to a single DNA molecule and two molecules ligated, respectively. To extract the relative amount of molecules in each lane we compute, using ImageJ, the intensity of each lane and account for the fact that the band with dimers has chains that are twice as long. We then normalise against the sum of the three bands to obtain the relative fraction of chains in each population.
II.3.2 Microrheology
The viscosity of the systems is measured using particle tracking microrheology. Solutions are made by mixing 8 l of 1288 linearised plasmid at different concentrations to a final concentration in the range 2ng/l-500ng/l with 1 l of 40 U/ul T4 ligase and 1 l of T4 ligase reaction buffer. Control solutions are prepared at the same time and in the same manner substituting additional TE for the T4 ligase. The samples are kept at room temperature on a roller for several days. The samples are then spiked with nm PVP-coated polystyrene beads, pipetted and sealed onto a slide and imaged using an inverted microscope. We take a 30-minute movie and we analyse the movies using a particle tracking algorithm (trackpy[30]) and extract the trajectories and mean squared displacements(MSD) of the tracers . Diffusion coefficients are extracted by fitting to the MSDs via MSD. The viscosity is obtained using the Stokes-Einstein relation[31], .
III Results
III.1 Smoluchowski equation with cyclisation
In this result section, we first propose a modified Smoluchowski equation[18, 19] describing polymers undergoing irreversible condensation (ligation) and cyclisation. Linear polymers undergo irreversible ligation with rate , with the polymerisation indexes of the reactants, and cyclisation with rate . The concentrations of linear polymers of polymerisation index at time , , and of rings, , are thus governed by the following equations:
(5a)
(5b)
Once a linear chain undergoes cyclisation, it becomes a ring and cannot undergo ligation anymore, as the reactions are assumed to be irreversible. The kinetics is also constrained by the requirement that the total mass is conserved:
(6)
where is the total number of monomers and is the system’s volume. Assuming that the reaction takes place on a time scale larger than the Rouse relaxation time, the length-dependence of the annealing rate is[32, 33]
(7)
(8)
where is the length of a polymer with a degree of polymerisation and is the initial polymer length, so that the chain’s radius of gyration is . In Eq.(8), is a dimensionless constant and is a constant that depends on temperature and the viscous friction of the solvent . For example, in the Rouse model[34] () and thus .
This condensation rate captures the diffusion-controlled search process[32, 33]. The cyclisation rate is taken to be , where . Note that this is different from the classic Shimada-Yamakawa theory[20, 35] which would predict at lengths larger than because we (i) are out-of-equilibrium and (ii) account for the subdiffusion of the polymer end within the volume of the coil.
In equilibrium, the looping probability of a chain is given by the Shimada-Yamakawa formula[20, 36]. For the looping probability of a polymer decays as with . This looping probability also holds for an irreversible, non-equilibrium scenario if the process is reaction-limited. This is because the chain ends would have the time to explore many conformations and to diffuse the whole volume of the chain, , before reacting (as it would happen in equilibrium).In a diffusion-limited, irreversible ligation process, one should instead compute the time it takes for an end to diffuse over a certain distance . The dynamics of the end is described by the Rouse model[34] so that , where is the size of a Kuhn monomer. Then, setting (the size of the polymer coil) one obtains , which implies . So considering effectively takes into account the fact that the chain ends are performing a sub-diffusive search process within the polymer coil, as expected for Rouse dynamics.
We have verified that the rate of cyclisation scales as the length of the chain to the power by measuring the rate at which rings are produced for different lengths of the linear chains (Fig.2a-b). We have done this by changing the initial length and by running short simulations, in turn assuming that the system has had no time to create dimers, trimers, etc. and by measuring the number of rings formed. We have observed that the rate of ring formation at early times which is close to the expected with . Thus, both theory and simulations suggest that the diffusion limited, irreversible looping probability of a polymer scales with its length as .
To validate the functional form used for the condensation rate (Eq.(8)), we solve the Smoluchowski equation in the limit of small concentration and short times, where only monomer, dimer and monomer ring populations are assumed to be present (see next Section). In Fig.2c we plot the condensation rate as a function of different initial polymer lengths obtained by fitting the analytical solution of Eq.(10b) (see below) to the monomer chains population omitting the second term since no rings were present in these conditions and at early times. From this quantity, we fit a power law with and find yields a good fit to the simulated data. This validates de Gennes’ hypothesis for the functional form of the condensation rate (Eq.(8)) and our choices for and .
In our experiments and simulations, we typically track the mean length of the polymers as a function of time
(9)
and we fit this observable with the numerical solution of the full Smoluchowski equation, Eqs.(5a)-(5b)). This is practically implemented in a MATLAB code. The numerical evaluation of the system is iterated to find the best free parameters and that fit the mean length obtained from simulations or experiments.The fit is done using the nonlinear least squares MATLAB function lsqcurvefit. The rate of ring formation and the rate of linear chains formation are extracted from this fit by considering 40 independent replicas, allowing us to obtain the error on the fit parameters. For experiments, we typically average over 3 independent replicas. The numerical and fitting algorithms are described in detail in Appendix C.
III.1.1 Time-dependence of the mean length: dilute regime
At short times and in the dilute regime, we can assume that the formation of rings and short -mers is more favourable. This assumption is valid in the experiments whenever only linear monomers, dimers and monomer rings are visible in the gel electrophoresis after ligation. In more dense solutions the presence of rings consisting of more than two monomer chains will be present and is observed in our simulations.Under very dilute conditions, we can thus assume that only monomers, dimers and monomer rings are present. Denoting the number density of monomer rings, linear monomers and dimers as and , respectively, the Smoluchowski equations describing the system take the form
(10a)
(10b)
(10c)
We solve Eq.(10b) neglecting the second term as in the infinite dilution limit:
(11)
The concentration of monomer rings is thus
(12)
which yields
(13)
Substituting in Eq.(10c), we get
(14)
from which one obtains
(15)
Assuming these three are the only contributions to the system, the mean length is then given by the following relation
(16)
In denser solutions, where the population is more polydisperse, the Smoluchowski equation cannot be solved analytically and we refer to the next Section for a scaling prediction and to Sec. III.3.1 for a perturbative approach in the limit of small cyclisation rate.
As mentioned above, we validate de Gennes’ equation for the condensation rate (Eq.(8)) by running short simulations at very high dilution. We then fitted the change in number of ring monomers with the closed solutions Eqs.((13)) for different values of initial polymer length . Similarly, we fit the solution of Eq.(10b) without the ring term to the population of monomers. From these data, we validate the scaling of the rates and as a function of length (Fig.2b-c).
III.1.2 Time-dependence of the mean length: concentrated regime
Here we give scaling arguments for the solution of the Smoluchowski equation in the concentrated limit, with the assumption that ring formation is negligible. At the mean-field level, we can make the simplifying assumption that the system can be described by a single characteristic length scale [37]. Under this assumption, the annealing rate scales as
(17)
The total polymer density thus follows , so that from the dimensional analysis the time evolution of the characteristic length is[38, 39]
(18)
with . For Rouse dynamics, one has , whereas for reptation[40]. The Flory exponent has value for ideal chains and for self-avoiding chains[40]. Assuming concentrations above overlap but still far from the melt concentration (for which one would have ideal chain statistics and ), we can assume , so that if the system is unentangled and in the presence of entanglement. We note, however, that using Eq.(17) in the presence of entanglements is only valid for times longer than the reptation time [41].
III.2 Linear DNA condensation
III.2.1 Simulations
We first simulate linear condensation using Molecular Dynamics. As detailed in the Methods section, we simulate polymers with beads of size bp and persistence length bp. These polymers are thus designed to coarse-grain 6.5 kb-long DNA plasmids which will be employed in experiments (see next section).During the simulation, we take snapshots of the system and record the list of bonds to reconstruct the topology of the polymers (see Fig.1). Over the simulation time, the number of initial linear chains decreases due to the formation of (i) longer linear polymers or (ii) circular chains (Fig.3a). Additionally, lower monomer concentrations promote the formation of more rings at large times and a slower decrease of the linear species. We also note that (i) the number fraction of rings converges to a finite value at large time, and that (ii) while the number of linear chains appears to go to zero, their mean length increases (Fig. 3b). Accordingly, the (number) average length of polymers grows more quickly for larger (Fig.3b). Thus, we conclude that loop formation competes with the growth of the chains, and that cyclisation is dominant in dilute systems. Interestingly, the curves of the mean length can be fitted extremely well by the numerical solution of the Smoluchowski equation Eq.(5) (Fig.3b).
III.2.2 Experiments
As described in the Methods Section, we can perform DNA condensation using solutions of linearised DNA plasmids, mixed with ATP and DNA T4 ligase. We then perform a time-resolved experiment, where we draw aliquots from a master reaction at given time points from the addition of the T4 ligase. By running the aliquots on agarose gels we can visualise and compute the fraction of molecules in the linear and ring, monomeric, dimeric, etc. states. Fig.4a reports a picture of one such gel, displaying a single band of monomeric linear DNA (as it disappears after exonuclease treatment) at , evolving into three bands, one of which is exonuclease resistant (a monomer ring) at larger times. In Fig.4b we plot the relative abundance of these populations, from which we obtain the number average molecular length (Fig.4c).
III.2.3 Dimensionless topological parameter
Since we initialise our simulations and experiments below entanglement conditions we fix as expected for Rouse dynamics and as expected for self-avoiding polymers[34] (we verified these exponents through direct MD simulations in Fig.2a-c). In general, the Smoluchowski coagulation equation (Eq.(5)) is then solved numerically to fit the data of mean length versus time, , obtained in simulations and experiments via the free parameters and . A key number in our system is the ratio of the rates at which polymers are condensed , and the one at which rings are formed . We thus define a dimensionless “topological parameter” , where is the number density of monomeric chains of length at the start of the simulation or experiment.
Albeit related to the classic j-factor employed in DNA looping[20, 6], our topological parameter is more naturally interpreted as the number of rings formed for every two linear chains that are fused together. Intuitively, this number determines the final topological composition of the system. At , we expect the final state of the system to be dominated by rings, while for to be dominated by linear chains. Importantly, since the probability of ring formation decreases in time as the average length of the linear chains increases. Accordingly, and even though our system has a ring-only irreversible absorbing state, we conjecture that the strongly decreasing looping probability may effectively yield a very long time-transient in which the system is dominated by entangled linear chains with circular contaminants (see below for more simulations on this).
Importantly, we expect the Smoluchowski equation to be valid only in the limit in which three-body interactions are negligible, the values of and should be independent on concentration only when is small enough. By plotting as a function of (where is computed at the beginning of the simulation or experiment) we show that scales as in both simulations and experiments until where it starts to deviate (Fig.5); this confirms that the Smoluchowski approximation is valid in this range of concentrations. Importantly, in Fig.5 we also identify the crossover value (at which the initial cyclisation rate is larger than the dimerisation rate) around . We note that the agreement between simulations and experiments is excellent for small . However, quantitative analysis of gel electrophoresis images at larger is challenging due to the poor separation of multimeric bands.
III.3 Runaway Transition
The results in Fig.3 suggest that at large the chains tend to grow longer, and cyclisation is suppressed; at the same time, the density of reactive ends and the speed of spatial exploration of the chains become smaller, thus suppressing dimerisation. Due to this kinetic competition, we ask whether the system can truly display a “runaway” phase, defined as a regime where at least one chain permanently escapes cyclisation and its length diverges in time. One way to address this question is to look at the number of chains that belong to the longest chain in the system, and how this quantity changes in time.By using a graph representation of our simulations (see Fig.6a-b) we can compute the fraction of chains (nodes) that belong to the giant connected component (GCC), i.e. the largest cluster of connected monomer chains (Fig.6b). In Fig.6a one can visually appreciate that at large reaction time, rings (blue) are abundant at low while linear chains (grey) are more abundant at large . These systems display a qualitatively different graph topology (Fig.6b). At small (large ) the network of monomers is mostly disconnected; accordingly, even when the fraction of unreacted bonds goes to 0 at , the average length of the polymers does not diverge. On the contrary, at larger we observe only a few rings and some very long chains that are connecting most of the nodes in the system. Overall, the graph appears much more connected and approaching percolation, i.e. where most of the nodes belong to the GCC, whose size grows with the size of the system (see Fig.6c).
III.3.1 Calculation of mass converted into rings at infinite time
Although our simulations support the notion that small values of will result in linear chains of increasing lengths and vanishing cyclisation rate, they are fundamentally limited to finite-size systems where the cyclisation rate of the largest chain never rigorously goes to 0. To estimate the amount of mass that is converted into rings at long times, we do a perturbative calculation valid in the limit of small . We start from the continuum Smoluchowski equation:
(19)
We define which is thus a scaling function such that where [42]. We now treat perturbatively, starting with . In this case, there is no mass lost into rings and we can thus write a conservation law
(20)
Even for non-zero, we assume the loss of mass to cyclisation remains finite and of order . We will check the self-consistency of this assumption below. Using the mass conservation and Eq.(19) we can write the following scaling relations: , . We therefore obtain:
(21)
which is the same as Eq.(18). Note that we must have for the average length of polymers to increase over time. We can also write the density distribution as
(22)
which in the limit of long times or large lengths may be written as
(23)
where is a scaling function that only depends on the ratio .
We now introduce the ring length distribution and its evolution equation as
(24)
Since at time there are no rings, we can then write
(25)
We can plug in the result we obtained for the distribution of length of linear chains Eq.(23) to yield
(26)
where we defined . Thus, the number density of polymers that are converted into rings over infinite time is
(27)
Since and assuming the when , the integral converges at 0. For convergence of this integral at we also require that the scaling function decays faster than .
Assuming this functional form for the distribution of ring lengths at infinite time, we now compute the total average mass transformed into rings at infinite time as
(28)
The convergence of this integral requires that and in this case we get
(29)
From this equation, we see that the fraction of mass in rings at infinite time converges to a finite value proportional to (and hence at small ).
With this calculation, we have thus shown that at small enough but non-zero , the fraction of mass turning into rings is finite if or which is valid for any type of polymer in the non-entangled () regime.This implies that in this regime we expect the cyclisation probability to decay fast enough and cannot prevent the runaway of the mass into linear chains that keep growing in time.
Consistently with this, in both MD and MC simulations, we never observe the formation of rings larger than 10 initial monomers. As shown here using asymptotic theory the mass fraction of linear polymers goes to a finite limit at in a thermodynamic system. We find that the key condition to ensure the existence of runaway transition is that the cyclisation rate decays strongly enough. More specifically, we require the exponent to be or . This condition is always met in the Rouse unentangled () regime,provided that the polymers are not fully collapsed (). This argument establishes the existence of a runaway transition in the limit of large time and at large enough concentrations .
III.3.2 Direct Simulation Monte Carlo simulations of irreversible condensation
To formally address the existence of a true runaway transition in the thermodynamic limit, we compute the fraction of monomers belonging to linear species in systems of increasing size. To perform this calculation, we employ Direct Simulation Monte Carlo[23, 24, 25, 26] to solve the Smoluchowski equation in systems with up to chains. We run the DSMC code until it has reacted all ends apart from 2 and compute the average length of the linear population of chains, . As shown in Fig.6c, our MD simulations show that at the GCC displays a change in scaling, growing as as suggesting that a qualitative change in behaviour takes place around . In Fig.6d, we also plot the number averaged chain length at an arbitrarily large time when the DSMC code has evolved the system as long as possible and has generated only a single linear chain. Fig.6d suggests that the linear-dominated regime () displays an average polymer length at a large reaction time that scales as . Additionally, the fraction of mass “lost” in forming rings grows as and is thus negligible for small enough (Fig.6e). Finally, as shown in Fig.6f, the mean length of the linear chains displays a plateau for , which grows with the system size, strongly indicating a true runaway transition at the critical value or .
III.4 Dynamics and Rheology
To test the consequences of the runaway transition on the dynamics and rheology of the system, we perform microrheology experiments and compute dynamics in MD simulations. DNA microrheology is well established and the effects of DNA concentration, length and topology on microrheology have been studied in the past[43, 44, 45, 46, 47, 48, 49]. Here, we perform microrheology by tracking nm PVP-coated polystyrene beads added in a solution of DNA that has been treated with either 40U T4 ligase for a week (and thus to full extent of reaction) or with buffer for a week (control) at different initial concentrations. We ran a small aliquot of the samples in a gel and observed that indeed at the fraction of linear chains overcome the rings at large times (Fig.7a-b).
At low concentrations, our microrheology shows that the MSD of the tracer particles is unaffected by DNA ligation (Fig.7c). On the contrary, for , we find that the MSDs of the tracers in the ligated systems are much slower and display a stronger subdiffusive behaviour than the control (Fig.7c). From the MSD, we extract the large-time diffusion coefficient of the tracers and the effective viscosity of the sample via the Stokes-Einstein equation[31]. The plot of the normalised viscosity (Fig.7d) suggests that a dynamical transition takes place around (or ) which matches the structural runaway transition seen before (Fig.6). After the transition, the viscosity increases exponentially with the concentration (see inset of Fig.7d). This suggests a relaxation process dominated by end-retraction[34], possibly due to the threading of very long linear chains through small rings[50, 51, 52, 53, 54] or pseudo-knotted parts of their own extremely long contour[55, 56]. We note that, especially at large , the ligated solution is extremely elastic and the passive tracers do not display a freely diffusive behaviour even after a lag time of ten minutes. We thus argue that the reported may be lower bounds at large , which would render the transition even more dramatic. All this implies that, intriguingly, near the transition , systems prepared at similar concentrations may display extremely different rheology at large condensation times. To further support the existence of a qualitative change in the dynamics, we compute the values of viscosity obtained in MD simulations through the diffusion coefficient of the centre of mass of chains that have been ligated for long time at different initial concentrations (see red circles in Fig.7d). One can appreciate that our simulations also suggest a qualitative difference in dynamics for , albeit the transition appears less dramatic than in experiments; we argue that this may be due to finite size effects present in MD simulations.
IV Conclusion
We have studied a system of linear polymers undergoing irreversible condensation in competition with cyclisation. We have shown that the key adimensional parameter controlling growth kinetics is ; naturally interpreted as the number of rings formed for any one dimerisation. At large concentrations (or ) dimerisation is kinetically favoured and drives the growth of linear chains. While growth disfavours cyclisation, it also reduces the number of available reactive ends and the annealing rate of the chains (see Eq.(8)), disfavouring further growth. Despite this, we discover that the net result of this kinetic competition is a runaway transition for if the cyclisation rate decays strongly enough with polymer length, i.e. with , with the metric exponent (typically 1/2 for random walks and 0.588 for self-avoiding walks) and the dynamics exponent (typically 1 for Rouse and 2 for reptative dynamics). In these conditions, the fraction of monomers transformed into rings is finite, thus leaving the rest of the monomers available to form a permanently growing linear chain which then drive a runaway reaction.
We also discover that the runaway transition has deep consequences on the rheology, and triggers an exponential increase for (or ).Our results suggest that it may be possible to tune the final topological composition of ligated systems by judiciously choosing . For instance, the most likely regime to form large rings and ring-linear blends[51, 52] is near the transition . Mixing polymer families with different reactive ends further enhances the designability as it introduces different for each family. Our results can be used to optimise the conditions for DNA engineering, e.g., transfection vectors[2] ought to be ligated at whereas synthetic chromosomes assemblies[57] at large . Finally, it may be possible to couple dissipative DNA breakage reactions[48, 58, 59] with ATP-consuming ligation to create dense solutions of self-sustained topologically active viscoelastic fluids which would be an interesting active fluid to investigate in the future.
Acknowledgements.
DM acknowledges the support of the Royal Society via a University Research Fellowship. This project has received support from European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 947918 to DM and No 677532 to ML). The authors acknowledge insightful discussions with Daan Noordermeer and Antonio Valdes who also kindly gifted us with the 1288 plasmid. Source codes are available at https://git.ecdf.ed.ac.uk/taplab/dna-ligation.git. For the purpose of open access, the author has applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising from this submission.
Appendix A Topology reconstruction algorithm
We will refer to the Topology Reconstruction code from now on as TR. The code takes as input the instantaneous trajectory and bond list from LAMMPS and checks for newly formed linear and ring chains. The output of the Python code is a file containing the number and length of linear chains that have formed in a given simulation time step. Similar files are produced for the ring chains. These files are then used to calculate the average length and the number of linear/ring chain figures.
The starting point of the TR algorithm is an array of size ; each row represents the IDs of atoms that are bonded within the system:
Since not all particles are linked together, some do not appear in the array . To avoid operations with large sparse arrays, the matrix is mapped to that contains only indexes from to the maximum number of atoms connected .
The next step of the TR algorithm is to create a connectivity matrix based on the list . Each row of represents an atom index and consists of three components . Since in our case a particle can be linked with two more particles the first two components of each row represent the connections of particle (note that and are not necessarily consecutive in 1D but can be any other particle bonded to particle ). The third component, , takes only the values and accounts for the particles that already belong to a polymer chain. The flag column of is initialised to zeros. During the reading process of the connectivity matrix, the algorithm switches the flags to 1 of the particles that are already considered to belong in a chain. Rings are extracted in the same manner and a ring is found if the current atom index is the same as the starting atom index. This reading process outputs arrays that have different lengths and each of them contains the particle (mapped) ids that are connected in a polymer chain. The final step of the TR algorithm is to map the atom indexed back to the original ones: . This algorithm is very generic and can be applied also in cases where the atoms are not initially in polymers as in our case, but rather individual atoms that can connect during the simulation.
Appendix B Description of the Monte Carlo step
In this paragraph, we will describe the single Monte Carlo (MC) step, which is repeated a large number of times during the numerical resolution of Eq.(5) performed using the DSMC algorithm. With reference to Eq.(5), we define (total number of chains). Before the start of the simulation, we give an estimate of the maximum annealing rate and of the maximum cyclisation rate . The exactness of the algorithm does not depend on this initial choice, however, choosing values that are too far from the actual maximum rates can lead to a reduced efficiency[23].
During every MC step, we either attempt to perform a ligation reaction (with probability ) or a cyclisation one (with probability ). The value of is calculated initially and then updated during the simulation in such a way that the average number densities satisfy (5). At the beginning of each MC step, is evaluated as
(30)
We will show below that this choice also guarantees that the simulation samples the correct number of cyclisation and ligation events per unit volume and unit time as required by Eq.(5).
We define a waiting time variable that is set to zero at the beginning of the simulation. After each reaction, a waiting time increment is generated. These increments are also chosen to guarantee the correct number of ligation and fragmentation reactions per unit of time/volume, as detailed below. We can now describe the MC step, during which the following actions are performed:
1.
We evaluate the probability of annealing according to Eq.(30). The explicit form of , Eq.(30), will be discussed in detail below.
2.
We pick a random number from a uniform distribution. If , we attempt a ligation event:
(a)
We pick a pair of elements of the array , denoted at random. Since there are ordered pairs of chains, the probability of picking a specific pair is . Let the length associated with these elements be and .
(b)
We evaluate the ligation rate for the two chains. If , we set and return to (1). Otherwise, we continue.
(c)
We pick another random number , and perform the ligation if . If ligation is unsuccessful, we return to (1). Otherwise, we continue.
(d)
We increment the waiting time by . Here is a parameter, the only condition on which is that it must be between and , as we will discuss in more detail below.
(e)
After incrementing the waiting time, we update by setting and .
3.
If , we attempt a cyclisation event:
(a)
We pick a chain at random with probability . Let .
(b)
We evaluate the cyclisation rate . If , set and return to (1). Otherwise, we continue.
(c)
We extract another random number from a uniform distribution, and perform cyclisation if . If cyclisation is unsuccessful, we return to (1). Otherwise, we continue.
(d)
We increment the waiting time by , with defined above in step (2).
(e)
We record the value of in and set .
We now prove that the definitions of (Eq.(30)), the waiting time increments (for ligation) and (for cyclisation) give several ligation and cyclisation events per unit time which is consistent with the Smoluchowski equation Eq.(5). Over a single MC step, the mean number of ligation events involving the ordered pair of filaments is
(31)
We note that in the algorithm we consider as an ordered pair, and thus in (31) we consider the reaction as distinct from . The mean number of ligation events involving any two chains with lengths can be obtained by multiplying the above quantity by . The factor takes into account the fact that, as mentioned above, for , there are two ways to perform the ligation, whereas for there is only one. The factor is the product of the volume fractions of filaments of lengths and . We thus have
(32)
where we have equated the mean number of ligation events involving any two chains with lengths to the value required by the Smoluchowski equation. Recalling that , we thus find
(33)
Eq.(33) relates the time interval to the probability of ligation. We will now obtain a second equality involving and , which will allow us to prove that the expression Eq.(30) for guarantees the correct number of ligation and cyclisation events per unit time.
The mean number of cyclisation events involving chains is
(34)
To obtain the mean number of cyclisations of a generic mer we need to multiply this quantity by , i.e., the volume fraction of filaments of length . Equating this quantity to the expected number of rings formed in a time interval we obtain
(35)
and hence
(36)
By equating the two expressions for , Eq.(33) and Eq.(36), we find Eq.(30). We have thus proven that the latter is the correct expression of , which gives the correct number of cyclisation and ligation events per unit time and unit volume, as required by the Smoluchowski equation.
Finally, we will prove below that the constants and introduced when calculating the waiting time increments are consistent with Eq.(33) and Eq.(36). To show this, it is sufficient to observe that the total time increment during an MC step is:
One can see that this equality is consistent with Eq.(33) and Eq.(36). We note that the algorithm samples on average the correct kinetics independently of the value of , as long as . Here we take , meaning that the waiting time increment is calculated only after a successful ligation reaction, but not after a successful cyclisation reaction.
Appendix C Numerical integration of modified Smoluchowski
Solving the Smoluchowski equation to fit the data from MD simulations consists of two main parts:
1.
We create an objective function for the lsqcurvefit (called Obj_smoluchowski) that takes as input the array of initial coefficient guess and the time data array . It returns the average length as a function of time, array . In the objective function:
(a)
An array is initialised where and . This represents the set of lengths that can be found in the system (recall that we initialise our MD simulations with 200 chains of 174 beads each). Also, the arrays with the number density of linear and ring chains are initialised as follows, and since initially all the molecules are linear chains. Here, denotes the volume of the simulation box.
(b)
for = {1 to simulation final step time} docall function (see point 2 below)
(c)
update arrays and . Calculate the total average length as
(d)
exit for loop and parse to
2.
The exEuler_smoluchowski function takes as input the initial number densities of linear and ring chains and the reaction rates . Based on the given rates K, it outputs the final number density arrays , after the reactions have taken place. When this function is called, the number density of linear and ring chains of each population are updated according to Eq.(5). The monomer, dimer, and so on populations are increased according to the first two terms of Eq.(5) while the number of them that is converted into rings is subtracted by the and added to the array. In the first two terms of Eq.(5a) the rate is not a scalar quantity by rather a matrix that follows the relation Eq.(8). The extracted coefficient against which the fitting is optimised is the scalar . Similarly, for the sink term of Eq.(5a)-(5b), the equation is used and the fitting coefficient exported is the scalar .
The coefficients are updated iteratively by the lsqcurvefit algorithm to best fit the data. Once the optimum values are obtained the algorithm terminates.
References
RubinsteinandColby [2003]M.RubinsteinandH.R.Colby,Polymer Physics(Oxford University Press,2003).
Hobby: Web surfing, Rafting, Dowsing, Stand-up comedy, Ghost hunting, Swimming, Amateur radio
Introduction: My name is Virgilio Hermann JD, I am a fine, gifted, beautiful, encouraging, kind, talented, zealous person who loves writing and wants to share my knowledge and understanding with you.
We notice you're using an ad blocker
Without advertising income, we can't keep making this site awesome for you.