Model description


In the context of the EvoEvo project, we have designed an integrated model to study Evolution of Evolution. Obviously, such a model cannot include the whole complexity of real organisms. Moreover, one has to keep in mind that our objective is not to study the evolution of such or such organism. Its very aim is to study the evolutionary process and to unravel the EvoEvo strategies that result from a pure Darwininan evolution. That is why we propose to study evolution and EvoEvo in a simplified, abstract, world, designed by an «artificial chemistry» (Dittrich et al., 2001). This artificial chemistry provides a set of objects and a set of rules that govern their interactions. In the model, organisms will be composed of these objects, the rules giving them their dynamic and, ultimately, their fitness.

To build our integrated model, we designed a modular artificial chemistry: the set of objects and the set of rules are split into modules that represent the different organisation levels we want to study as well as the interactions between those levels (e.g. a specific set of rules specifies how the genome is transcribed and translated into proteins). Here we propose to include five levels in our model: the genome, the genetic network, the metabolic network, the fitness and the environment (note that we don’t include any «phenotype»: the phenotype will simply be the result of the metabolic network dynamic in the organism’s environment). These levels are described in the following sections.

Basically, our integrated evolutionary model is an individual based model. Each individual is an asexual virtual cell owning a genome. This genome encodes a regulatory network and a metabolic network. The metabolic network can uptake and convert nutrients from the environment, resulting in the ability (or not) for the cell to divide. Dividing cells form a population that grows on a two dimensional environmental grid providing fresh nutrients, but also nutrients and waste released by cells, actively or after death. This dynamical process results in a modification of the environment and allows for the emergence of complex ecosystems. It also creates the conditions for local competition between cells.

In the model, the molecular structure of the organisms is entirely defined by their genome (possibly with an interaction with the organism’s environment). This genome can undergo mutation at each replication (point mutations and large rearrangements). The interaction between this variation process and the competition process described above results in a Darwinian evolution: individuals become more and more adapted to their environment and able to replicate more an more efficiently. However, by doing so, they also modify the shared environment (e.g. by releasing new metabolites), thus changing their own evolutionary conditions…

Our formalism is inspired from Crombach and Hogeweg (2008) and Beslon et al. (2010b). We describe below each biological organization level. For each of them, we also remember in which of the four iterative software those levels are included (see table below).

Iteration Genome level Genetic regulation level Metabolic network level Variable environment level Specifications deliverable
Genome-network model D2.1
Population model D2.3
Realistic-network model D2.5
Integrated evolutionary model D2.7


The genome level

Iteration
Genome-network model
Population model
Realistic-network model
Integrated evolutionary model

Genome structure

Each organism owns a circular string of functional/non-functional elements. The genome is a coarse-grained genome, inspired on Crombach and Hogeweg (2008), and defined as a list of functional or non-functional elements mathematically defined as n-tuples (the elements of our genome). Each functional tuple is parametrised to define its function and its number of dimensions n (depending on the type). Five types of tuples are defined in our artificial chemistry:

  • E type: Tuples coding for enzymes performing reactions in the metabolic network, via the following Michaelis-Menten equation:

        \[\frac{d[p]}{dt} = \frac{k_{cat}.[E] .[s]}{k_m+[s]}\]

    with s, p \in \mathbb{N}^*, k_{cat} \in \mathbb{R} and k_m \in \mathbb{R}^+ being encoded in the tuple, [s] and [p] the concentrations of the metabolites s and p, and [E] the enzymatic concentration (we assume that the concentration of free enzymes [E] is always equal to the total concentration [E_T]),

  • TF type: Tuples coding for transcription factors. Each transcription factor i owns a binding site identification tag j \in \mathbb{Z} and an affinity A_{ij} for this binding site,
  • BS type: Tuples coding for binding sites specifying which transcription factor may bind to them via their own identification tag \in \mathbb{Z},
  • P type: Tuples coding for promoters determining where the transcription should start. Each promoter i owns a basal expression level \beta_i,
  • NC type: Non functional tuples constituting the non coding part of the genome.

Binding sites directly flanking a promoter regulate its transcriptional activity. The enhancer site directly precedes the promoter and is made of one or more contiguous binding sites. The operator site directly follows the promoter and is also made of one or more contiguous binding sites. Transcription factors that bind on the enhancer site increase the transcriptional activity. On the opposite, transcription factors that bind on the operator site down-regulate the promoter activity (see figure 1.1). As in R-aevol (Beslon et al., 2010b), a promoter has a basal level activity \beta.
TF or E tuples following the operator site are transcribed, thereby allowing for operons. Downstream of the operator site, any tuple other than TF or E makes the transcription stop. To be functional, the promoter can be flanked by binding sites or not, but TF or E tuples must immediately follow the regulation unit (enhancer site + promoter + operator site, see figure 1.2).

Mutational operators

The genome undergoes point mutations and large rearrangements during replication. If a tuple undergoes a point mutation, it operates a jump in the tuple space by adding a n-dimensional random vector. A tuple can be unfunctionalised by a point mutation or during a large rearrangement if it is located on a breakpoint. Non coding tuples can also be restored into one or another functional type, however it is impossible to mutate directly from a functional type to another (see figure 1.4). All the mutation rates are configurable. The genome also undergoes large chromosomic rearrangements: duplications, deletions, inversions, and translocations. The various types of mutation can modify existing tuples, but also create new tuples, delete some existing tuples, modify the length of the non coding regions, modify tuple order…

Figure 1. (1) A functional region starts with a promoter, possibly flanked by an enhancer site and/or an operator site. All contiguous E or TF tuples following the operator site are transcribed. The first tuple of another type interrupts the transcription (here a NC tuple). In this case, the functional region is an operon.
(2.A) The promoter can be flanked by binding sites or not, but TF or E tuples must immediately follow the regulation unit (enhancer + promoter + operator).
(2.B) On left, a non coding tuple interrupts the transcription. On right, the promoter is missing.
(3) The genome is a circular single-strand sequence of tuples. At each replication, the genome undergoes mutations: (A) large duplications, (B) large deletions, (C) translocations, (D) inversions. Red arrows symbolize breakpoints in the sequence. (E) point mutations and breakpoints can unfunctionalise or functionalise a tuple.
(4) Functional types (E, TF, BS and P) can be unfunctionalised with probability p_{\mbox{NC}}. A NC tuple can be restored to one type or another depending on 4 mutations rates: p_{\mbox{BS}} is the probability to be restored in a binding site (resp. p_{\mbox{PROM}}, p_{\mbox{E}} and p_{\mbox{TF}}).


The genetic regulation network level

Iteration
Genome-network model
Population model
Realistic-network model
Integrated evolutionary model

The genetic regulation network (GRN) is computed from the interactions of transcription factors (TF) and binding sites (BS) elements. Its activity is computed in four steps:

  1. The activity A_s(t) of each binding site s is A_s(t) = \sum_j c_j(t).A_{js} with c_j(t) the concentration of the transcription factor j at time t and A_{js} the affinity of this transcription factor for the binding site s,
  2. We then compute the activity of the enhancer site E_i(t) and of the operator site O_i(t) flanking the promoter i:

        \[\left\{ \begin{array}{lcr} E_i(t) = \sum_{j \in enhancer_i} A_s(t)\\\\ O_i(t) = \sum_{j \in operator_i} A_s(t) \end{array} \right.\]

  3. Then, the transcription rate e_i over time of the promoter i is computed, possibly including a stochastic component (Elowitz et al., 2002) represented by a transcriptional noise \eta genetically encoded in the promoter (Roberts et al., 2011). Note that \eta mutates as all the elements of the tuple, allowing for evolution of stochastic gene expression (Newman et al., 2006). Then the transcription rate e_i of each promoter i is given by an Hill-like function:

        \[e_i(t) = \beta_i . \left(\frac{\theta^n}{O_i(t)^n + \theta^n}\right) . \left(1 + \left(\frac{1}{\beta_i}-1\right)\left(\frac{E_i(t)^n}{E_i(t)^n + \theta^n}\right)\right) + \xi_i(t)\]

    with \beta_i the basal expression level of the promoter i, n and \theta being constant coefficients that determine the shape of the Hill function. \xi_i(t) is a random number drawn from the gaussian distribution \mathcal{N}(0,\eta_i). %Since stochasticity is inevitable (the cell cannot escape the physical and chemical laws), a minimal noise \eta_0 exists such that \eta_i \geqslant \eta_0.

  4. The transcription rate e_i is applied to each E or TF tuple being controlled by the promoter i, such that each protein product (enzyme or transcription factor) has its own concentration regulated through a synthesis-degradation rule, depending on e_i:

        \[\frac{\partial c_i}{\partial t} = e_i(t) - \phi i(t)\]

    where \phi is a temporal scaling constant representing the protein degradation rate.


The metabolic network level

Iteration
Genome-network model
Population model
Realistic-network model
Integrated evolutionary model

Enzymes performing reactions in the metabolic network are encoded by tuples of type E. If s \neq p, the reaction takes place in the cytoplasm of the cell. If s = p, the enzyme is a inflowing or outflowing pump depending on the sign of k_{cat}. For each cell, the whole set of reactions defines an ordinary differential equations (ODE) system, which is solved numerically.

Some metabolic products are essential for the cell’s growth, and some other are intermediate products or waste. In the integrated evolutionary model, prime numbers are considered to be essential metabolites: their production contribute to the growth rate by increasing the probability to produce offspring. Over-producing metabolites can also lead to cell’s toxicity. Hence, one can define toxicity thresholds for essential and non essential metabolites. Over-reaching a toxicity threshold impairs cell’s fitness. Finally, during replication, daughter cells share cytoplasmic content at division (proteins and metabolites). It is also possible to define energy constraints in the artificial chemistry, such that cells must perform catabolic reactions to earn energy and produce essential metabolites.


Coupling the genetic and the metabolic networks

Iteration
Genome-network model
Population model
Realistic-network model
Integrated evolutionary model

Bacteria are able to sense their environment by detecting the presence of a particular molecule or signal, and to give an appropriate answer by updating their gene expression profile. In the integrated evolutionary model, co-enzymes can repress or activate transcription factors activity. This is done by adding three elements to the transcription factor tuple (TF): A co-enzyme identification tag \in \mathbb{N}^*, a free activity (A_{free}, boolean) and a bound activity (A_{bound}, boolean).

A metabolite m \in \mathbb{N}^* can repress or activate a TF acting as a co-enzyme:

  • If A_{bound} = true and A_{free} = false, the co-enzyme activates the TF,
  • If A_{free} = true and A_{bound} = false, the co-enzyme inhibits the TF,
  • If A_{free} = false and A_{bound} = false, the TF is never active,
  • If A_{free} = true and A_{bound} = true, the TF is always active.

Finally, the concentration c_i(t) of the TF and the concentration coE_i(t) of the co-enzyme are combined (depending on the values of A_{free} and A_{bound}) to compute the active fraction of c_i(t).


Population and environment levels

Iteration
Genome-network model
Population model
Realistic-network model
Integrated evolutionary model

Individuals «live» on a two dimensional grid, each grid site containing at most one individual. The physical environment is described at the grid level: each grid site contains a list of free metabolites, each with its concentration level. Those free metabolites diffuse with a diffusion rate D and are degraded with a degradation rate D_g.

Individuals compete for the free metabolites and to produce offspring in empty sites. Individuals interact with their local environment by pumping metabolites in and out and releasing their content at death. Metabolites can also diffuse through the cell membrane at rate D_m. In this case, pumps are active mechanisms that the cell can use to maintain an internal concentration different from the external one.

At each simulation time step, organisms are evaluated and either killed, updated or replicated depending on their current state:

  • An active cell can go to death. Death probability follows a Poisson law. At death, cell content is released in the local environment,
  • If the cell do not die and is unable to divide (e.g. because there is no free space in its neighbourhood), its genetic regulation network and metabolic network are updated, and its score is computed from its metabolic concentrations state vector X through the score function F (e.g. the sum of essential metabolite concentrations),
  • \item For each empty grid site, all active neighbours compete to select the replicating individual, depending on relative fitnesses. To avoid biases, empty grid cells are updated in a random order.


This simple framework allows to model different real experimental setups, including serial plates or chemostat (Mozhayskiy and Tagkopoulos, 2013). Similarly, some individuals can be regularly picked up in the environment to seed a new colony, thus mimicking a mutation accumulation experiment.

funding
EvoEvo is an Information and Communication Technologies initiative funded by the European Commission under FP7.
Background image - Young Tree, 1932, Paul Klee