In the context of the EvoEvo project, we have designed an integrated model to study Evolution of Evolution. Obviously, such a model cannot include the whole complexity of real organisms. Moreover, one has to keep in mind that our objective is not to study the evolution of such or such organism. Its very aim is to study the evolutionary process and to unravel the EvoEvo strategies that result from a pure Darwininan evolution. That is why we propose to study evolution and EvoEvo in a simplified, abstract, world, designed by an «artificial chemistry» (Dittrich et al., 2001). This artificial chemistry provides a set of objects and a set of rules that govern their interactions. In the model, organisms will be composed of these objects, the rules giving them their dynamic and, ultimately, their fitness.
To build our integrated model, we designed a modular artificial chemistry: the set of objects and the set of rules are split into modules that represent the different organisation levels we want to study as well as the interactions between those levels (e.g. a specific set of rules specifies how the genome is transcribed and translated into proteins). Here we propose to include five levels in our model: the genome, the genetic network, the metabolic network, the fitness and the environment (note that we don’t include any «phenotype»: the phenotype will simply be the result of the metabolic network dynamic in the organism’s environment). These levels are described in the following sections.
Basically, our integrated evolutionary model is an individual based model. Each individual is an asexual virtual cell owning a genome. This genome encodes a regulatory network and a metabolic network. The metabolic network can uptake and convert nutrients from the environment, resulting in the ability (or not) for the cell to divide. Dividing cells form a population that grows on a two dimensional environmental grid providing fresh nutrients, but also nutrients and waste released by cells, actively or after death. This dynamical process results in a modification of the environment and allows for the emergence of complex ecosystems. It also creates the conditions for local competition between cells.
In the model, the molecular structure of the organisms is entirely defined by their genome (possibly with an interaction with the organism’s environment). This genome can undergo mutation at each replication (point mutations and large rearrangements). The interaction between this variation process and the competition process described above results in a Darwinian evolution: individuals become more and more adapted to their environment and able to replicate more an more efficiently. However, by doing so, they also modify the shared environment (e.g. by releasing new metabolites), thus changing their own evolutionary conditions…
Our formalism is inspired from Crombach and Hogeweg (2008) and Beslon et al. (2010b). We describe below each biological organization level. For each of them, we also remember in which of the four iterative software those levels are included (see table below).
Iteration | |
Genome-network model | |
Population model | |
Realistic-network model | |
Integrated evolutionary model |
Each organism owns a circular string of functional/non-functional elements. The genome is a coarse-grained genome, inspired on Crombach and Hogeweg (2008), and defined as a list of functional or non-functional elements mathematically defined as -tuples (the elements of our genome). Each functional tuple is parametrised to define its function and its number of dimensions
(depending on the type). Five types of tuples are defined in our artificial chemistry:
with ,
and
being encoded in the tuple,
and
the concentrations of the metabolites
and
, and
the enzymatic concentration (we assume that the concentration of free enzymes
is always equal to the total concentration
),
Binding sites directly flanking a promoter regulate its transcriptional activity. The enhancer site directly precedes the promoter and is made of one or more contiguous binding sites. The operator site directly follows the promoter and is also made of one or more contiguous binding sites. Transcription factors that bind on the enhancer site increase the transcriptional activity. On the opposite, transcription factors that bind on the operator site down-regulate the promoter activity (see figure 1.1). As in R-aevol (Beslon et al., 2010b), a promoter has a basal level activity .
TF or E tuples following the operator site are transcribed, thereby allowing for operons. Downstream of the operator site, any tuple other than TF or E makes the transcription stop. To be functional, the promoter can be flanked by binding sites or not, but TF or E tuples must immediately follow the regulation unit (enhancer site + promoter + operator site, see figure 1.2).
The genome undergoes point mutations and large rearrangements during replication. If a tuple undergoes a point mutation, it operates a jump in the tuple space by adding a -dimensional random vector. A tuple can be unfunctionalised by a point mutation or during a large rearrangement if it is located on a breakpoint. Non coding tuples can also be restored into one or another functional type, however it is impossible to mutate directly from a functional type to another (see figure 1.4). All the mutation rates are configurable. The genome also undergoes large chromosomic rearrangements: duplications, deletions, inversions, and translocations. The various types of mutation can modify existing tuples, but also create new tuples, delete some existing tuples, modify the length of the non coding regions, modify tuple order…
Iteration | |
Genome-network model | |
Population model | |
Realistic-network model | |
Integrated evolutionary model |
The genetic regulation network (GRN) is computed from the interactions of transcription factors (TF) and binding sites (BS) elements. Its activity is computed in four steps:
with the basal expression level of the promoter
,
and
being constant coefficients that determine the shape of the Hill function.
is a random number drawn from the gaussian distribution
. %Since stochasticity is inevitable (the cell cannot escape the physical and chemical laws), a minimal noise
exists such that
.
where is a temporal scaling constant representing the protein degradation rate.
Iteration | |
Genome-network model | |
Population model | |
Realistic-network model | |
Integrated evolutionary model |
Enzymes performing reactions in the metabolic network are encoded by tuples of type E. If , the reaction takes place in the cytoplasm of the cell. If
, the enzyme is a inflowing or outflowing pump depending on the sign of
. For each cell, the whole set of reactions defines an ordinary differential equations (ODE) system, which is solved numerically.
Some metabolic products are essential for the cell’s growth, and some other are intermediate products or waste. In the integrated evolutionary model, prime numbers are considered to be essential metabolites: their production contribute to the growth rate by increasing the probability to produce offspring. Over-producing metabolites can also lead to cell’s toxicity. Hence, one can define toxicity thresholds for essential and non essential metabolites. Over-reaching a toxicity threshold impairs cell’s fitness. Finally, during replication, daughter cells share cytoplasmic content at division (proteins and metabolites). It is also possible to define energy constraints in the artificial chemistry, such that cells must perform catabolic reactions to earn energy and produce essential metabolites.
Iteration | |
Genome-network model | |
Population model | |
Realistic-network model | |
Integrated evolutionary model |
Bacteria are able to sense their environment by detecting the presence of a particular molecule or signal, and to give an appropriate answer by updating their gene expression profile. In the integrated evolutionary model, co-enzymes can repress or activate transcription factors activity. This is done by adding three elements to the transcription factor tuple (TF): A co-enzyme identification tag , a free activity (
, boolean) and a bound activity (
, boolean).
A metabolite can repress or activate a TF acting as a co-enzyme:
Finally, the concentration of the TF and the concentration
of the co-enzyme are combined (depending on the values of
and
) to compute the active fraction of
.
Iteration | |
Genome-network model | |
Population model | |
Realistic-network model | |
Integrated evolutionary model |
Individuals «live» on a two dimensional grid, each grid site containing at most one individual. The physical environment is described at the grid level: each grid site contains a list of free metabolites, each with its concentration level. Those free metabolites diffuse with a diffusion rate and are degraded with a degradation rate
.
Individuals compete for the free metabolites and to produce offspring in empty sites. Individuals interact with their local environment by pumping metabolites in and out and releasing their content at death. Metabolites can also diffuse through the cell membrane at rate . In this case, pumps are active mechanisms that the cell can use to maintain an internal concentration different from the external one.
At each simulation time step, organisms are evaluated and either killed, updated or replicated depending on their current state:
\item For each empty grid site, all active neighbours compete to select the replicating individual, depending on relative fitnesses. To avoid biases, empty grid cells are updated in a random order.