E. coli Fed-batch fermentation model, including recombinant protein production#

In this Python Script the ODE (Ordinary Differential Equation) of an Ecoli model is solved with the Python ODE Solver odeint.

The equations for E. coli growth have been based on the work of Anane et al, which can be found in https://doi.org/10.1016/j.bej.2017.05.013. The protein production dynamics have been based in the work of Chae et al. which can be found in https://doi.org/10.1002/1097-0290(20000805)69:3<275::aid-bit5>3.0.co;2-y. The combine implementation can be found in the MSc Thesis https://scholar.tecnico.ulisboa.pt/records/uuoC1N4NzN890o9vFCM5lQm9zjqNvBsOY3pp in Section 3.1.

Package import#

This portion of the code handles the import of all the relevant python packages.

# import packages
from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint
import plotly.express as px
import plotly.graph_objects as go
from scipy.integrate import solve_ivp
from plotly.subplots import make_subplots
import pandas as pd
%matplotlib inline
#%matplotlib qt
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 3
      1 # import packages
      2 from __future__ import division
----> 3 import numpy as np
      4 import matplotlib.pyplot as plt
      5 from scipy.integrate import odeint

ModuleNotFoundError: No module named 'numpy'

Define timesteps#

This cell defines the initial and end-time, and the timestops to perform the simulation

# discretizing the time
dt = 0.01

# defining the initial time
t0 = 0

# defining the end-time
T = 25 #h

# generation of the time-points
t = np.linspace(t0, T, int(T/dt)+1)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[2], line 11
      8 T = 25 #h
     10 # generation of the time-points
---> 11 t = np.linspace(t0, T, int(T/dt)+1)

NameError: name 'np' is not defined

Define initial conditions and process parameters#

In this cell the initial conditions for the monitored variables as well as process parameters regarding the feed strategy. y0 is the vector which contains the initial conditions values.

# initial values at time 0
X0   = 0.025     # Biomass [g/L]
S0   = 20         # Substrate [g/L]
DOT0 = 90        # Dissolved oxygen [%]
A0   = 0          # Acetate [g/L]
V0   = 1400     # Volume [L]
I0 = 0            #Inducer concentration [g/L]
P0 =0             #Protein concentratuin [g/L]

# process parameters

Si        = 300
If        = 50
Fout      = 0
KLa       = 600
muset     = 0.3

# define initial condition and save it in the y
y0 = [X0, S0, DOT0, A0, V0, I0, P0]

First pass of the model#

The feeding strategy employed is trigered by a decrease in Glucose concentration below a certain value. In the approach implemented here the timepoint at which this happens is determined by running the model a first time, without considering feed, i.e., running the model as if we were studying a batch process. We also extract the Biomass concetration at the end of the batch phase, which is necessary for the feed rate calculations

Note: If using e.g., the solve_ivp solver, a different approach to extract these values can be followed by the use of flags. This would be a more elegant implementation

Define the model equations#

This cell defines the model equation in Batch mode.

  1. The first step is to assign the variable to simulate to the result vector y.

  2. The feed parameters are defined, in this case set to 0, since we consider a batch process

  3. The parameters characteristic for the strain are defined as well as physicochemical constants

  4. The algebraic expressions that describe the kinetics of the system are defined

  5. The ODE system is defined

# define the function; Run the model a first time, with no feeding to determibe tf
def eColi_first_pass(y, t):
    X   = y[0]
    S   = y[1]
    DOT = y[2]
    A   = y[3]
    V   = y[4]
    I   = y[5]
    P   = y[6]

    # process parameters

    F1=0
    Fi=0

    # parameters describing the characteristics of the strain
    qAmax   = 1           # max Acetate consumption rate [g/g*h]
    Kaq     = 0.01        # affinity constant Acetate consumption [g/L]
    Ksq     = 0.1         # affinity constnat Substrate consumption [g/L]
    Yam     = 0.2         # yield acetate maintenance [g/g]
    Yaresp  = 0.2         # yield acetate respiratory [g/g]
    Yem     = 0.56        # yield excluding maintance [g/g]
    qSmax   = 1.4         # max glucose uptake rate [g/g*h]
    Ks      = 0.05        # affinity constant glucose consumption[g/L]
    qm      = 0.04        # specific maintenance coefficient[g/g*h]
    Ko      = 1           # Affinity constant, oxygen consumption [g/L]
    Yosresp = 1.217       # yield from S to X, respiratory [g/g]
    pAmax   = 1           # max Acetate production rate [g/g*h]
    Kap     = 10          # affinity constant intracellular acetate production[g/L]
    Yaof    = 1           # aceate yield in overflow[g/g]
    Yofm    = Yem         # [g/g]
    k1=0.32
    k2=0.00044
    k3=0.6
    Ki=0.55
    muset = 0.3         #Growth rate set in feed phase

    # physicochemical constants
    Cs      = 0.391       # ratio of substrate per C in [gC/gS]
    Cx      = 0.488       # ratio of biomass per C in [gC/gX]
    H       = 14000       # conversion factor
    DOTstar = 90

    # algebraic variables
    qS   = qSmax * S / (S + Ks) * DOT / (DOT + Ko)     # substrate uptake
    qSof = pAmax * qS / ( qS + Kap ) / Yaof            # overflow substrate conversion
    pA   = pAmax * qS / (qS + Kap)                     # production of acetate
    qSox = qS - qSof                                   # substrate uptake excluding overflow
    qSan =(qSox - qm) * Yem * Cx / Cs                  # anabolic substrate consumption
    qsA  = qAmax * A / (A + Kaq) * (Ksq / (Ksq + qS))  # acetate consumption
    qA   = pA - qsA                                    # total acetate equilibrium
    qO   = Yosresp * (qSox - qSan) + qsA * Yaresp      # oxygen uptake

    # growth rate equation
    my =(qSox - qm) * Yem + qsA * Yam + qSof * Yofm

    # differential equation system
    dXdt   = -F1 / V * X + my * X                   # biomass growth
    dSdt   = -F1 / V * S - qS * X + F1 / V * Si     # substrate evolution
    dDOTdt = KLa * (DOTstar - DOT) - qO * X * H     # oxygen dynamics
    dAdt   = qA * X - F1 / V * A                    # aetate evolution
    dVdt   = (F1 - Fout)                            # volume change
    dIdt = -F1/V*I + (Fi/V)*(If-I)                  #inducer dynamics
    dPdt = -F1/V*P + ((k1*my*I)/(Ki+I)+k2)* X - k3 * P #protein dynamics

    #the solution of odeint is formulated as a system of an ODE
    dydt = [dXdt, dSdt, dDOTdt, dAdt, dVdt, dIdt, dPdt]

    return dydt

Extract the relevant variables#

From the first pass on the model solving we extract the values that correspond to S < 0.5

solution_first_pass = odeint(eColi_first_pass, y0, t)

# Extract S and find tf when S drops below 0.5
S = solution_first_pass[:, 1]   #Substrate concentraion S
X = solution_first_pass[:, 0]  # biomass concentration X
V = solution_first_pass[:, 4]  # volume V

tf, Xbf, Vbf = None, None, None
for i in range(len(t)):
    if S[i] < 0.5:
        tf = t[i]     # Time when S drops below 0.5
        Xbf = X[i]    # Biomass concentration X at that time
        Vbf = V[i]    # Volume V at that time
        break  # Exit the loop once we find tf

print(f"tf: {tf}, Xbf: {Xbf}, Vbf: {Vbf}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 solution_first_pass = odeint(eColi_first_pass, y0, t)
      3 # Extract S and find tf when S drops below 0.5
      4 S = solution_first_pass[:, 1]   #Substrate concentraion S

NameError: name 'odeint' is not defined

Second pass of the model#

Here the model will be defined including the Fed-batch dynamics

Define the model equations#

This cell defines the model equation in Fed-Batch mode.

  1. The first step is to assign the variable to simulate to the result vector y.

  2. The parameters characteristic for the strain are defined as well as physicochemical constants

  3. The algebraic expressions that describe the kinetics of the system are defined

  4. The feeding strategy is defined, both for the substrate and inductor. The expression used to calculate the substrate feed (F1) is dependednt on the substrate concentration. The expression used to calculate the inducer feed (Fi) is depedend on the biomass and inducer concentration.

  5. The ODE system is defined

# define the function; Solve the model a second time, now including the feed eqs.
def eColi_second_pass(y, t, tf, Xbf, Vbf):
    X   = y[0]
    S   = y[1]
    DOT = y[2]
    A   = y[3]
    V   = y[4]
    I   = y[5]
    P   = y[6]


    # parameters describing the characteristics of the strain

    qAmax   = 1           # max Acetate consumption rate [g/g*h]
    Kaq     = 0.01        # affinity constant Acetate consumption [g/L]
    Ksq     = 0.1         # affinity constnat Substrate consumption [g/L]
    Yam     = 0.2         # yield acetate maintenance [g/g]
    Yaresp  = 0.2         # yield acetate respiratory [g/g]
    Yem     = 0.56        # yield excluding maintance [g/g]
    qSmax   = 1.4         # max glucose uptake rate [g/g*h]
    Ks      = 0.05        # affinity constant glucose consumption[g/L]
    qm      = 0.04        # specific maintenance coefficient[g/g*h]
    Ko      = 1           # Affinity constant, oxygen consumption [g/L]
    Yosresp = 1.217       # yield from S to X, respiratory [g/g]
    pAmax   = 1           # max Acetate production rate [g/g*h]
    Kap     = 10          # affinity constant intracellular acetate production[g/L]
    Yaof    = 1           # aceate yield in overflow[g/g]
    Yofm    = Yem         # [g/g]
    k1=0.32
    k2=0.00044
    k3=0
    Ki=0.55


    # physicochemical constants
    Cs      = 0.391       # ratio of substrate per C in [gC/gS]
    Cx      = 0.488       # ratio of biomass per C in [gC/gX]
    H       = 14000       # conversion factor
    DOTstar = 90

    # algebraic variables
    qS   = qSmax * S / (S + Ks) * DOT / (DOT + Ko)     # substrate uptake
    qSof = pAmax * qS / ( qS + Kap ) / Yaof            # overflow substrate conversion
    pA   = pAmax * qS / (qS + Kap)                     # production of acetate
    qSox = qS - qSof                                   # substrate uptake excluding overflow
    qSan =(qSox - qm) * Yem * Cx / Cs                  # anabolic substrate consumption
    qsA  = qAmax * A / (A + Kaq) * (Ksq / (Ksq + qS))  # acetate consumption
    qA   = pA - qsA                                    # total acetate equilibrium
    qO   = Yosresp * (qSox - qSan) + qsA * Yaresp      # oxygen uptake

    # growht rate equation
    my =(qSox - qm) * Yem + qsA * Yam + qSof * Yofm


    #Define the feeding eqautions

    #no feed if t<tf
    if  t < tf:
        F1 = 0

    #We start with exponential feed
    else:
        if t <= (tf + 3):
            F1 = (Xbf * Vbf * muset / (Yosresp * Si)) * np.exp(muset * (t - tf))
        else:
          #once S exceeds 0.03 we switch to a constant feeding strategy
            if S >= 0.03:
                F1 = 43.5
                if F1 < 0:
                    F1 = 0
            else:
                F1 = (Xbf * Vbf * muset / (Yosresp * Si)) * np.exp(muset * (t - tf))

    # Handle Fi dynamically based on X > 20; Start induction for rhGH production
    if X < 20:
        Fi = 0
    else:
      #We start with a flow rate of 20 to quicly increase IPTG concentration and start protein production
       if I < 0.238:
        Fi = 20
       # Once the desired concentration of 0.238 (1mM) is reached, the flow rate is calculated such that this concentration remains constant
       else:
          Fi = 0.238 * F1 / (If - 0.238)


    # differential equation system
    dXdt   = -F1 / V * X + my * X                   # biomass growth
    dSdt   = -F1 / V * S - qS * X + F1 / V * Si     # substrate evolution
    dDOTdt = KLa * (DOTstar - DOT) - qO * X * H     # oxygen dynamics
    dAdt   = qA * X - F1 / V * A                    # aetate evolution
    dVdt   = (F1 + Fi - Fout)                       # volume change
    dIdt = -F1/V*I + (Fi/V)*(If-I)                  #inducer dynamics
    dPdt = -F1/V*P + (((k1*my*I)/(Ki+I))+k2)* X -k3 *P  #protein dynamics

    #the solution of odeint is formulated as a system of an ODE
    dydt = [dXdt, dSdt, dDOTdt, dAdt, dVdt, dIdt, dPdt]

    return dydt

Extract the solution of the ODE system#

# calling the numerical solver to approximate the integral of the differential equation system
y = odeint(eColi_second_pass, y0, t, args=(tf, Xbf, Vbf))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[7], line 2
      1 # calling the numerical solver to approximate the integral of the differential equation system
----> 2 y = odeint(eColi_second_pass, y0, t, args=(tf, Xbf, Vbf))

NameError: name 'odeint' is not defined

Generate the plots#

This cell is used to collect all the simulated variables in a dataframe, and add them to 4 different subplots.

def create_plot(t, y):
        figure = make_subplots(rows=2, cols=2,specs=[[{},{"secondary_y": True}], [{},{}]]) #make figure with 2 subplots
        #assign simulation results to variable for plotting
        X = y[:, 0]  # Extract data as 1D arrays
        S = y[:, 1]
        DOT = y[:, 2]
        A = y[:, 3]
        V = y[:, 4]
        I = y[:, 5]
        P = y[:, 6]


         #collect all variables to plot in 1st subplot in a dataframe
        df = pd.DataFrame({'t': t, 'X': X, 'S': S, 'DOT':DOT, 'Acet': A, 'V':V, 'Ind': I, 'P':P})
         #add the different traces to 1st subplot

        figure.add_trace(go.Scatter(x=df['t'], y=df['X'], name='Biomass'), row=1, col=1)
        figure.add_trace(go.Scatter(x=df['t'], y=df['S'], name='Substrate'), row=1, col=1)

        figure.add_trace(go.Scatter(x=df['t'], y=df['Acet'], name='Acetate'), row=1, col=2)
        figure.add_trace(go.Scatter(x=df['t'], y=df['P'], name='protein'), row=1, col=2)
        figure.add_trace(go.Scatter(x=df['t'], y=df['Ind'], name='inducer'), row=1, col=2, secondary_y=True)
        figure.add_trace(go.Scatter(x=df['t'], y=df['DOT'], name='DOT'), row=2, col=1)

        figure.update_yaxes(title_text="Concentration (g/L)", row=1, col=1)  # Subplot (1, 1)

        figure.update_yaxes(title_text="Protein and acetate (g/L)", row=1, col=2)  # Subplot (1, 2)
        figure.update_yaxes(title_text="Inducer (g/L)", row=1, col=2, secondary_y=True) # Subplot (1, 2), secondary y-axis
        figure.update_yaxes(title_text="Volume (L)", row=2, col=2)  # Subplot (2, 2)
        figure.update_yaxes(title_text="DOT (%)", row=2, col=1)  # Subplot (2, 1)

         #add the title and axes labels
        figure.update_layout(title=('Simulaton results'),
                             xaxis_title='time (h)')

         #add trace to the 2nd subplot
        figure.append_trace(go.Scatter(x=df['t'], y=df['V'], name='Volume'), row=2, col=2)


        return figure

Create the figure#

fig = create_plot(t, y) # Call create_plot with the simulation results

fig.show()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[9], line 1
----> 1 fig = create_plot(t, y) # Call create_plot with the simulation results
      3 fig.show()

NameError: name 't' is not defined

E. coli fed-batch simulation Created on Thu Dec 10 12:14:22 2015

@author: Terrance Wilms, Nicolas Cruz, Kevin Stegemann, Rosa Haßfurther Updated Sept 2024

Protein production dynamics added by Mariana Albino (October 2024)