Data Visualization

Data Visualization#

Visualizing data can be helpful on many occasions, such as investigating the data and its distribution, checking for outliers, evaluating results and more.

Here we visualize some experimental data in different plots and provide some code for the most common plot types used.

The matplotlib library provides a great starting point for learning how to visualize data in Python, you can find many more plot types here.

%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd

packed_column = pd.read_csv('packed_column.csv', encoding='utf-8', sep=';')
packed_column.head()
flow type water flow / air flow 0 kg/h 100 kg/h 200 kg/h 300 kg/h
0 small 15 1 2 3 3
1 small 30 2 2 5 3
2 small 50 2 3 6 5
3 small 80 3 4 9 9
4 small 100 3 5 13 13

Let’s start with a standard line plot.

# create a figure and set the figsize - you can play with the size until you are happy with the proportions
plt.figure(figsize=([10,5]))

# add four line plots, one for each water flow
plt.plot(packed_column['0 kg/h'], label='0 kg/h', marker='o')
plt.plot(packed_column['100 kg/h'], label='100 kg/h', marker='o')
plt.plot(packed_column['200 kg/h'], label='200 kg/h', marker='o')
plt.plot(packed_column['300 kg/h'], label='300 kg/h', marker='o')

# Add title, labels, legend and a grid
plt.title('Relative pressure drop in a packed column with various water flows')
plt.xlabel('Water flow')
plt.ylabel('Pressure')
plt.legend()
plt.grid()

# specify x axis range
plt.xlim([1, 8])

# show figure
plt.show()
../../_images/fd521ee08d8e3a9e734c344d32d8abea9e0c886821ce5bc6f35f0708d3640f80.png

Now let’s make a scatter plot.

We can specify many parameters, such as the color and the alpha (transparency).

time = [1, 3, 5, 8, 10, 13]
temperature = [12, 14, 16, 30, 45, 50]

# define type of plot - here we'll make a scatterplot
plt.scatter(time, temperature, alpha=1, color='green')
plt.scatter(time, [i+2 for i in temperature], alpha=0.5, color='green')
plt.scatter(time, [i+4 for i in temperature], alpha=0.1, color='green')
plt.title('Scatter plot of temperature over time')
plt.show()
../../_images/bfd52baea230faeef540edffbcaf3a41f958f0940432e879db25c7d5064afa50.png

Note, you can also use pandas functions for a quick plot, but for more elaborate and nice plots, matplotlib or similar libraries are recommended.

packed_column['100 kg/h'].hist()
<AxesSubplot:>
../../_images/4829e2721ac6c626aaad4f53e70627a6c2433f45dcefc63bd9844502f7d39aa3.png
packed_column['100 kg/h'].plot()
<AxesSubplot:>
../../_images/8a388a9a7897b2a3bd29a57264dbf330c2cfcf7e6322c37f2ac6b3ce13f611d1.png
packed_column[['0 kg/h', '100 kg/h', '200 kg/h', '300 kg/h']].iloc[:-1, :].boxplot()
<AxesSubplot:>
../../_images/fe6b6480e17b91fa7db5e1ae2124094a85c2bbf798825dcb89b67c0f0ad04ed0.png