Linear Regression#
In this notebook, we will explore some more operations, such as calculating the logarithm in numpy and doing a linear regression.
You have the following isothermic reaction: \(A+\frac{1}{6}B→\frac{1}{4}C+\frac{1}{2}D\)
During a laboratory experiment you measure in a batch reactor with constant volume and the initial concentration of \(C_A\) is 25 \(mol \cdot m^{-3}\).
import pandas as pd
import numpy as np
import scipy
from scipy import stats
import matplotlib.pyplot as plt
# define a distribution of CC
cc = np.linspace(0, 4, 11)
time = [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
# create a pandas dataframe with the data we simulated
df = pd.DataFrame()
df['Time'] = time
df['Cc'] = cc
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 1
----> 1 import pandas as pd
2 import numpy as np
3 import scipy
ModuleNotFoundError: No module named 'pandas'
# create a function to calculate CA given CC and CA0
def calculate_CA(C_A0, Cc):
C_A = C_A0-(4*Cc)
return C_A
# Looking at the stochiometry, we know that CA can be calculated from CC and CA0
df['Ca'] = calculate_CA(25, df['Cc'])
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[2], line 7
4 return C_A
6 # Looking at the stochiometry, we know that CA can be calculated from CC and CA0
----> 7 df['Ca'] = calculate_CA(25, df['Cc'])
NameError: name 'df' is not defined
Finding the logarithm of a value in Python#
Here we will use the numpy library to calculate the logarithm of the values defined.
# find the log of CA
df['ln(Ca)'] = round(np.log(df['Ca']), 2)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[3], line 2
1 # find the log of CA
----> 2 df['ln(Ca)'] = round(np.log(df['Ca']), 2)
NameError: name 'np' is not defined
Simple linear regression in Python with SciPy library#
Here we calculate a linear least-squares regression for two sets of measurements. Check the documentation here.
The unction returns:
Slope of the regression line
Intercept of the regression line
The Pearson correlation coefficient. The square of rvalue is equal to the coefficient of determination
The p-value for a hypothesis test whose null hypothesis is that the slope is zero, using Wald Test with t-distribution of the test statistic. See alternative above for alternative hypotheses
Standard error of the estimated slope (gradient), under the assumption of residual normality
Standard error of the estimated intercept, under the assumption of residual normality
df
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[4], line 1
----> 1 df
NameError: name 'df' is not defined
df['1/Ca'] = round(1/df['Ca'], 2)
# linear least-squares regression
m, b, r_value, p_value, std_err = scipy.stats.linregress(df['Time'].iloc[1:], df['1/Ca'].iloc[1:])
#plotting the results and annotating the plot
fig, ax = plt.subplots()
ax.scatter(df['Time'].iloc[1:], df['1/Ca'].iloc[1:])
ax.plot(df['Time'].iloc[1:], m*df['Time'].iloc[1:] + b)
ax.annotate('r^2: ' + str("{:.2f}".format(r_value**2)), xy=(2.5, 0.105))
ax.annotate('formula: ' + str("{:.2f}".format(m)) + 'x + ' + str("{:.2f}".format(b)), xy=(2.5, 0.10))
plt.title('Linear least-squares regression for two sets of measurements.')
plt.xlabel('Time [h]')
plt.ylabel('1/Ca')
plt.legend()
plt.grid()
fig.show()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[5], line 1
----> 1 df['1/Ca'] = round(1/df['Ca'], 2)
3 # linear least-squares regression
4 m, b, r_value, p_value, std_err = scipy.stats.linregress(df['Time'].iloc[1:], df['1/Ca'].iloc[1:])
NameError: name 'df' is not defined