Linear Regression#

In this notebook, we will explore some more operations, such as calculating the logarithm in numpy and doing a linear regression.

You have the following isothermic reaction: \(A+\frac{1}{6}B→\frac{1}{4}C+\frac{1}{2}D\)

During a laboratory experiment you measure in a batch reactor with constant volume and the initial concentration of \(C_A\) is 25 \(mol \cdot m^{-3}\).

import pandas as pd
import numpy as np
import scipy
from scipy import stats
import matplotlib.pyplot as plt

# define a distribution of CC
cc = np.linspace(0, 4, 11)
time = [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20]

# create a pandas dataframe with the data we simulated 
df = pd.DataFrame()
df['Time'] = time
df['Cc'] = cc
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 1
----> 1 import pandas as pd
      2 import numpy as np
      3 import scipy

ModuleNotFoundError: No module named 'pandas'
# create a function to calculate CA given CC and CA0
def calculate_CA(C_A0, Cc):
    C_A = C_A0-(4*Cc)
    return C_A

# Looking at the stochiometry, we know that CA can be calculated from CC and CA0
df['Ca'] = calculate_CA(25, df['Cc'])
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[2], line 7
      4     return C_A
      6 # Looking at the stochiometry, we know that CA can be calculated from CC and CA0
----> 7 df['Ca'] = calculate_CA(25, df['Cc'])

NameError: name 'df' is not defined

Finding the logarithm of a value in Python#

Here we will use the numpy library to calculate the logarithm of the values defined.

# find the log of CA
df['ln(Ca)'] = round(np.log(df['Ca']), 2)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[3], line 2
      1 # find the log of CA
----> 2 df['ln(Ca)'] = round(np.log(df['Ca']), 2)

NameError: name 'np' is not defined

Simple linear regression in Python with SciPy library#

Here we calculate a linear least-squares regression for two sets of measurements. Check the documentation here.

The unction returns:

  • Slope of the regression line

  • Intercept of the regression line

  • The Pearson correlation coefficient. The square of rvalue is equal to the coefficient of determination

  • The p-value for a hypothesis test whose null hypothesis is that the slope is zero, using Wald Test with t-distribution of the test statistic. See alternative above for alternative hypotheses

  • Standard error of the estimated slope (gradient), under the assumption of residual normality

  • Standard error of the estimated intercept, under the assumption of residual normality

df
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 df

NameError: name 'df' is not defined
df['1/Ca'] = round(1/df['Ca'], 2)

# linear least-squares regression
m, b, r_value, p_value, std_err = scipy.stats.linregress(df['Time'].iloc[1:], df['1/Ca'].iloc[1:])

#plotting the results and annotating the plot
fig, ax = plt.subplots()
ax.scatter(df['Time'].iloc[1:], df['1/Ca'].iloc[1:])
ax.plot(df['Time'].iloc[1:], m*df['Time'].iloc[1:] + b)
ax.annotate('r^2: ' + str("{:.2f}".format(r_value**2)), xy=(2.5, 0.105))
ax.annotate('formula: ' + str("{:.2f}".format(m)) + 'x + ' + str("{:.2f}".format(b)), xy=(2.5, 0.10))
plt.title('Linear least-squares regression for two sets of measurements.')
plt.xlabel('Time [h]')
plt.ylabel('1/Ca')
plt.legend()
plt.grid()
fig.show()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 df['1/Ca'] = round(1/df['Ca'], 2)
      3 # linear least-squares regression
      4 m, b, r_value, p_value, std_err = scipy.stats.linregress(df['Time'].iloc[1:], df['1/Ca'].iloc[1:])

NameError: name 'df' is not defined