{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "26f64782-0175-4b83-b337-2e3eb95f010c",
   "metadata": {},
   "source": [
    "# Pandas DataFrames"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2a84422f-8838-49d1-b24a-096171ff8760",
   "metadata": {},
   "source": [
    "`pandas` is a very popular Python library, which allows data to be loaded as a dataframe. \n",
    "The library allows many operations to be easily performed on the data, such as calculating the mean or the sum of a column, transpose the dataframe, aggregate values and so on. Moreover, it is widely used for statistical data analysis.\n",
    "\n",
    "It is very well documented, more information and tutorials about the package can be found [here](https://pandas.pydata.org/docs/index.html).\n",
    "\n",
    "For simplicity, the convention is to import `pandas` as `pd`, but in principle, it can be called anything, although it is always a good idea to stick to the accepted conventions to ensure code readability. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e5f6325-24b3-474e-954e-a446786e0dca",
   "metadata": {},
   "source": [
    "`pandas` allow data to be loaded from many different formats and has many functions implemented for this purpose.\n",
    "Here, we will load a .csv file, but you can load Excel, txt, JSON data as well as many other formats."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bb313b60-7ad4-466f-aeb2-324afdb23c3c",
   "metadata": {},
   "source": [
    "## 1. Loading and inspecting data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "37e911d8-d9ba-4b69-82b0-6e67ff021e9a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>flow type</th>\n",
       "      <th>water flow / air flow</th>\n",
       "      <th>0 kg/h</th>\n",
       "      <th>100 kg/h</th>\n",
       "      <th>200 kg/h</th>\n",
       "      <th>300 kg/h</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>small</td>\n",
       "      <td>15</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>small</td>\n",
       "      <td>30</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>small</td>\n",
       "      <td>50</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>6</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>small</td>\n",
       "      <td>80</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>9</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>small</td>\n",
       "      <td>100</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "      <td>13</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>big</td>\n",
       "      <td>40</td>\n",
       "      <td>5</td>\n",
       "      <td>9</td>\n",
       "      <td>14</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>big</td>\n",
       "      <td>50</td>\n",
       "      <td>7</td>\n",
       "      <td>14</td>\n",
       "      <td>22</td>\n",
       "      <td>39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>big</td>\n",
       "      <td>60</td>\n",
       "      <td>10</td>\n",
       "      <td>19</td>\n",
       "      <td>32</td>\n",
       "      <td>161</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>big</td>\n",
       "      <td>70</td>\n",
       "      <td>13</td>\n",
       "      <td>27</td>\n",
       "      <td>46</td>\n",
       "      <td>224</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>big</td>\n",
       "      <td>80</td>\n",
       "      <td>15</td>\n",
       "      <td>34</td>\n",
       "      <td>57</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  flow type  water flow / air flow  0 kg/h  100 kg/h  200 kg/h  300 kg/h\n",
       "0     small                     15       1         2         3         3\n",
       "1     small                     30       2         2         5         3\n",
       "2     small                     50       2         3         6         5\n",
       "3     small                     80       3         4         9         9\n",
       "4     small                    100       3         5        13        13\n",
       "5       big                     40       5         9        14        25\n",
       "6       big                     50       7        14        22        39\n",
       "7       big                     60      10        19        32       161\n",
       "8       big                     70      13        27        46       224\n",
       "9       big                     80      15        34        57         0"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "# if you have problems loading the data, provide the full path\n",
    "# add the path to where you saved your packed_column.csv here\n",
    "# the path should look like this: C:/Users/my_user/Desktop/week02/theory/\n",
    "#path = \n",
    "#packed_column = pd.read_csv(f'{path}packed_column.csv', encoding='utf-8', sep=';')\n",
    "\n",
    "# use read_csv to lead data from .csv file\n",
    "# here you can see that we provide the encoding and a separator\n",
    "packed_column = pd.read_csv('packed_column.csv', encoding='utf-8', sep=';')\n",
    "packed_column"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "98e4c935-990b-4653-9492-98d2ec515b69",
   "metadata": {},
   "source": [
    "In the cell above we loaded our DataFrame from a csv file, but what is actually a *DataFrame*?\n",
    "It is basically just a matrix (or table, if you prefer thinking about Excel tables) with rows and columns."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d4a9f427-778d-4ba6-b684-d5af1466ebab",
   "metadata": {},
   "source": [
    "Let's now see some `pandas` operations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "6cb324a4-2f51-440c-a8e5-c5512d550e8c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>flow type</th>\n",
       "      <th>water flow / air flow</th>\n",
       "      <th>0 kg/h</th>\n",
       "      <th>100 kg/h</th>\n",
       "      <th>200 kg/h</th>\n",
       "      <th>300 kg/h</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>small</td>\n",
       "      <td>15</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>small</td>\n",
       "      <td>30</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>small</td>\n",
       "      <td>50</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>6</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>small</td>\n",
       "      <td>80</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>9</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>small</td>\n",
       "      <td>100</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "      <td>13</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  flow type  water flow / air flow  0 kg/h  100 kg/h  200 kg/h  300 kg/h\n",
       "0     small                     15       1         2         3         3\n",
       "1     small                     30       2         2         5         3\n",
       "2     small                     50       2         3         6         5\n",
       "3     small                     80       3         4         9         9\n",
       "4     small                    100       3         5        13        13"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# inspect the top 5 elements in the dataframe\n",
    "packed_column.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "0867f753-ef63-4e43-829d-0f016b47541c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>flow type</th>\n",
       "      <th>water flow / air flow</th>\n",
       "      <th>0 kg/h</th>\n",
       "      <th>100 kg/h</th>\n",
       "      <th>200 kg/h</th>\n",
       "      <th>300 kg/h</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>small</td>\n",
       "      <td>15</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>small</td>\n",
       "      <td>30</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>small</td>\n",
       "      <td>50</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>6</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>small</td>\n",
       "      <td>80</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>9</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>small</td>\n",
       "      <td>100</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "      <td>13</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>big</td>\n",
       "      <td>40</td>\n",
       "      <td>5</td>\n",
       "      <td>9</td>\n",
       "      <td>14</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>big</td>\n",
       "      <td>50</td>\n",
       "      <td>7</td>\n",
       "      <td>14</td>\n",
       "      <td>22</td>\n",
       "      <td>39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>big</td>\n",
       "      <td>60</td>\n",
       "      <td>10</td>\n",
       "      <td>19</td>\n",
       "      <td>32</td>\n",
       "      <td>161</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>big</td>\n",
       "      <td>70</td>\n",
       "      <td>13</td>\n",
       "      <td>27</td>\n",
       "      <td>46</td>\n",
       "      <td>224</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>big</td>\n",
       "      <td>80</td>\n",
       "      <td>15</td>\n",
       "      <td>34</td>\n",
       "      <td>57</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  flow type  water flow / air flow  0 kg/h  100 kg/h  200 kg/h  300 kg/h\n",
       "0     small                     15       1         2         3         3\n",
       "1     small                     30       2         2         5         3\n",
       "2     small                     50       2         3         6         5\n",
       "3     small                     80       3         4         9         9\n",
       "4     small                    100       3         5        13        13\n",
       "5       big                     40       5         9        14        25\n",
       "6       big                     50       7        14        22        39\n",
       "7       big                     60      10        19        32       161\n",
       "8       big                     70      13        27        46       224\n",
       "9       big                     80      15        34        57         0"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# inspect the top 10 elements in the dataframe\n",
    "packed_column.head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "8340f18f-f232-4795-8f0a-52d3c8bf39ab",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>flow type</th>\n",
       "      <th>water flow / air flow</th>\n",
       "      <th>0 kg/h</th>\n",
       "      <th>100 kg/h</th>\n",
       "      <th>200 kg/h</th>\n",
       "      <th>300 kg/h</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>big</td>\n",
       "      <td>40</td>\n",
       "      <td>5</td>\n",
       "      <td>9</td>\n",
       "      <td>14</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>big</td>\n",
       "      <td>50</td>\n",
       "      <td>7</td>\n",
       "      <td>14</td>\n",
       "      <td>22</td>\n",
       "      <td>39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>big</td>\n",
       "      <td>60</td>\n",
       "      <td>10</td>\n",
       "      <td>19</td>\n",
       "      <td>32</td>\n",
       "      <td>161</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>big</td>\n",
       "      <td>70</td>\n",
       "      <td>13</td>\n",
       "      <td>27</td>\n",
       "      <td>46</td>\n",
       "      <td>224</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>big</td>\n",
       "      <td>80</td>\n",
       "      <td>15</td>\n",
       "      <td>34</td>\n",
       "      <td>57</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  flow type  water flow / air flow  0 kg/h  100 kg/h  200 kg/h  300 kg/h\n",
       "5       big                     40       5         9        14        25\n",
       "6       big                     50       7        14        22        39\n",
       "7       big                     60      10        19        32       161\n",
       "8       big                     70      13        27        46       224\n",
       "9       big                     80      15        34        57         0"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# inspect the bottom 5 elements in the dataframe\n",
    "packed_column.tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "e5edc6ab-aaff-4a28-be4e-de18bbec35a3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['flow type', 'water flow / air flow', '0 kg/h', '100 kg/h', '200 kg/h',\n",
       "       '300 kg/h'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# print the columns\n",
    "packed_column.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "0584a863-6ff9-41a3-970c-6d609741ec89",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "water flow / air flow    57.5\n",
       "0 kg/h                    6.1\n",
       "100 kg/h                 11.9\n",
       "200 kg/h                 20.7\n",
       "300 kg/h                 48.2\n",
       "dtype: float64"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# calculate the mean of each column\n",
    "packed_column[['water flow / air flow', '0 kg/h', '100 kg/h', '200 kg/h', '300 kg/h']].mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "1cf4d185-1567-49aa-a12a-b22f56db2638",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>water flow / air flow</th>\n",
       "      <th>0 kg/h</th>\n",
       "      <th>100 kg/h</th>\n",
       "      <th>200 kg/h</th>\n",
       "      <th>300 kg/h</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>15</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>30</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>50</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>6</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>80</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>9</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>100</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "      <td>13</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>40</td>\n",
       "      <td>5</td>\n",
       "      <td>9</td>\n",
       "      <td>14</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>50</td>\n",
       "      <td>7</td>\n",
       "      <td>14</td>\n",
       "      <td>22</td>\n",
       "      <td>39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>60</td>\n",
       "      <td>10</td>\n",
       "      <td>19</td>\n",
       "      <td>32</td>\n",
       "      <td>161</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>70</td>\n",
       "      <td>13</td>\n",
       "      <td>27</td>\n",
       "      <td>46</td>\n",
       "      <td>224</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>80</td>\n",
       "      <td>15</td>\n",
       "      <td>34</td>\n",
       "      <td>57</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   water flow / air flow  0 kg/h  100 kg/h  200 kg/h  300 kg/h\n",
       "0                     15       1         2         3         3\n",
       "1                     30       2         2         5         3\n",
       "2                     50       2         3         6         5\n",
       "3                     80       3         4         9         9\n",
       "4                    100       3         5        13        13\n",
       "5                     40       5         9        14        25\n",
       "6                     50       7        14        22        39\n",
       "7                     60      10        19        32       161\n",
       "8                     70      13        27        46       224\n",
       "9                     80      15        34        57         0"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "packed_column[['water flow / air flow', '0 kg/h', '100 kg/h', '200 kg/h', '300 kg/h']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "b2792d82-644d-4163-bf5a-4b5bef7aa6e4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "flow type                smallsmallsmallsmallsmallbigbigbigbigbig\n",
       "water flow / air flow                                         575\n",
       "0 kg/h                                                         61\n",
       "100 kg/h                                                      119\n",
       "200 kg/h                                                      207\n",
       "300 kg/h                                                      482\n",
       "dtype: object"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# calculate the sum of each column\n",
    "packed_column.sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d6fe0f72-85e1-4183-b3c3-13702083f05e",
   "metadata": {},
   "source": [
    "**Note: `describe` is a very useful function to remember!**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "30107fb4-2df5-4e82-8e69-d1117342149c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>water flow / air flow</th>\n",
       "      <th>0 kg/h</th>\n",
       "      <th>100 kg/h</th>\n",
       "      <th>200 kg/h</th>\n",
       "      <th>300 kg/h</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>10.000000</td>\n",
       "      <td>10.000000</td>\n",
       "      <td>10.000000</td>\n",
       "      <td>10.000000</td>\n",
       "      <td>10.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>57.500000</td>\n",
       "      <td>6.100000</td>\n",
       "      <td>11.900000</td>\n",
       "      <td>20.700000</td>\n",
       "      <td>48.200000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>25.739075</td>\n",
       "      <td>4.976612</td>\n",
       "      <td>11.376877</td>\n",
       "      <td>18.571484</td>\n",
       "      <td>78.389058</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>15.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>42.500000</td>\n",
       "      <td>2.250000</td>\n",
       "      <td>3.250000</td>\n",
       "      <td>6.750000</td>\n",
       "      <td>3.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>55.000000</td>\n",
       "      <td>4.000000</td>\n",
       "      <td>7.000000</td>\n",
       "      <td>13.500000</td>\n",
       "      <td>11.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>77.500000</td>\n",
       "      <td>9.250000</td>\n",
       "      <td>17.750000</td>\n",
       "      <td>29.500000</td>\n",
       "      <td>35.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>100.000000</td>\n",
       "      <td>15.000000</td>\n",
       "      <td>34.000000</td>\n",
       "      <td>57.000000</td>\n",
       "      <td>224.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       water flow / air flow     0 kg/h   100 kg/h   200 kg/h    300 kg/h\n",
       "count              10.000000  10.000000  10.000000  10.000000   10.000000\n",
       "mean               57.500000   6.100000  11.900000  20.700000   48.200000\n",
       "std                25.739075   4.976612  11.376877  18.571484   78.389058\n",
       "min                15.000000   1.000000   2.000000   3.000000    0.000000\n",
       "25%                42.500000   2.250000   3.250000   6.750000    3.500000\n",
       "50%                55.000000   4.000000   7.000000  13.500000   11.000000\n",
       "75%                77.500000   9.250000  17.750000  29.500000   35.500000\n",
       "max               100.000000  15.000000  34.000000  57.000000  224.000000"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 'describe' returns many useful statistics about the data\n",
    "packed_column.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "829e2fc5-9a58-48a5-8b2e-aa9c0abd461a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "      <th>9</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>flow type</th>\n",
       "      <td>small</td>\n",
       "      <td>small</td>\n",
       "      <td>small</td>\n",
       "      <td>small</td>\n",
       "      <td>small</td>\n",
       "      <td>big</td>\n",
       "      <td>big</td>\n",
       "      <td>big</td>\n",
       "      <td>big</td>\n",
       "      <td>big</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>water flow / air flow</th>\n",
       "      <td>15</td>\n",
       "      <td>30</td>\n",
       "      <td>50</td>\n",
       "      <td>80</td>\n",
       "      <td>100</td>\n",
       "      <td>40</td>\n",
       "      <td>50</td>\n",
       "      <td>60</td>\n",
       "      <td>70</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0 kg/h</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "      <td>7</td>\n",
       "      <td>10</td>\n",
       "      <td>13</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>100 kg/h</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>9</td>\n",
       "      <td>14</td>\n",
       "      <td>19</td>\n",
       "      <td>27</td>\n",
       "      <td>34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>200 kg/h</th>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>13</td>\n",
       "      <td>14</td>\n",
       "      <td>22</td>\n",
       "      <td>32</td>\n",
       "      <td>46</td>\n",
       "      <td>57</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>300 kg/h</th>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "      <td>9</td>\n",
       "      <td>13</td>\n",
       "      <td>25</td>\n",
       "      <td>39</td>\n",
       "      <td>161</td>\n",
       "      <td>224</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                           0      1      2      3      4    5    6    7    8  \\\n",
       "flow type              small  small  small  small  small  big  big  big  big   \n",
       "water flow / air flow     15     30     50     80    100   40   50   60   70   \n",
       "0 kg/h                     1      2      2      3      3    5    7   10   13   \n",
       "100 kg/h                   2      2      3      4      5    9   14   19   27   \n",
       "200 kg/h                   3      5      6      9     13   14   22   32   46   \n",
       "300 kg/h                   3      3      5      9     13   25   39  161  224   \n",
       "\n",
       "                         9  \n",
       "flow type              big  \n",
       "water flow / air flow   80  \n",
       "0 kg/h                  15  \n",
       "100 kg/h                34  \n",
       "200 kg/h                57  \n",
       "300 kg/h                 0  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# transpose a dataframe\n",
    "packed_column.T"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "0db8c8a8-e09e-4532-a962-ce08225a9e9b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     24\n",
       "1     42\n",
       "2     66\n",
       "3    105\n",
       "4    134\n",
       "5     93\n",
       "6    132\n",
       "7    282\n",
       "8    380\n",
       "9    186\n",
       "dtype: object"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# summing the rows (removing the categorical row) of the transposed dataframe\n",
    "packed_column.T[1:].sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14181bf6-3adb-44f4-8a9c-ae24ebc0e70f",
   "metadata": {},
   "source": [
    "## 2. Indexing\n",
    "\n",
    "It is also possible to *index* a DataFrame to update a value or select a subset of rows and columns we want to work with.\n",
    "\n",
    "We will use the `iloc` function for this purpose."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "c12c712a-e09d-4311-b399-0eda84db1fc5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>flow type</th>\n",
       "      <th>water flow / air flow</th>\n",
       "      <th>0 kg/h</th>\n",
       "      <th>100 kg/h</th>\n",
       "      <th>200 kg/h</th>\n",
       "      <th>300 kg/h</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>small</td>\n",
       "      <td>15</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>small</td>\n",
       "      <td>30</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>5.0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>small</td>\n",
       "      <td>50</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>6.0</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>small</td>\n",
       "      <td>80</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>small</td>\n",
       "      <td>100</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "      <td>13.0</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>big</td>\n",
       "      <td>40</td>\n",
       "      <td>5</td>\n",
       "      <td>9</td>\n",
       "      <td>14.0</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>big</td>\n",
       "      <td>50</td>\n",
       "      <td>7</td>\n",
       "      <td>14</td>\n",
       "      <td>22.0</td>\n",
       "      <td>39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>big</td>\n",
       "      <td>60</td>\n",
       "      <td>10</td>\n",
       "      <td>19</td>\n",
       "      <td>32.0</td>\n",
       "      <td>161</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>big</td>\n",
       "      <td>70</td>\n",
       "      <td>13</td>\n",
       "      <td>27</td>\n",
       "      <td>46.0</td>\n",
       "      <td>224</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>big</td>\n",
       "      <td>80</td>\n",
       "      <td>15</td>\n",
       "      <td>34</td>\n",
       "      <td>57.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  flow type  water flow / air flow  0 kg/h  100 kg/h  200 kg/h  300 kg/h\n",
       "0     small                     15       1         2       3.0         3\n",
       "1     small                     30       2         2       5.0         3\n",
       "2     small                     50       2         3       6.0         5\n",
       "3     small                     80       3         4       NaN         9\n",
       "4     small                    100       3         5      13.0        13\n",
       "5       big                     40       5         9      14.0        25\n",
       "6       big                     50       7        14      22.0        39\n",
       "7       big                     60      10        19      32.0       161\n",
       "8       big                     70      13        27      46.0       224\n",
       "9       big                     80      15        34      57.0         0"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# replace the value of row 3 column 4 with None\n",
    "# remember that in Python we start counting from 0!\n",
    "packed_column.iloc[3, 4] = None\n",
    "packed_column"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "19502a8c-de0b-4b0c-bed0-600894ecbb22",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "flow type                small\n",
       "water flow / air flow       15\n",
       "0 kg/h                       1\n",
       "100 kg/h                     2\n",
       "200 kg/h                   3.0\n",
       "300 kg/h                     3\n",
       "Name: 0, dtype: object"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# select the first row\n",
    "packed_column.iloc[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "527c8e1e-042c-4caa-b3f7-42bd059ef474",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     15\n",
       "1     30\n",
       "2     50\n",
       "3     80\n",
       "4    100\n",
       "5     40\n",
       "6     50\n",
       "7     60\n",
       "8     70\n",
       "9     80\n",
       "Name: water flow / air flow, dtype: int64"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# select a specific column\n",
    "packed_column['water flow / air flow']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c36a7bcf-5496-470a-81fb-7e6340007a58",
   "metadata": {},
   "source": [
    "## 3. Replacing missing values\n",
    "\n",
    "A very important data pre-processing task is imputing missing values.\n",
    "Depending on the data available and the task to perform, different strategies might be more suitable.\n",
    "\n",
    "The most common strategies are:\n",
    "- replacing the missing values with a specific value (often 0, but it can be any value)\n",
    "- replacing the missing values with the mean of the columns\n",
    "- dropping (discarding) the rows with missing values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "320f0c53-8e54-4e33-ab2e-28c092380506",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>flow type</th>\n",
       "      <th>water flow / air flow</th>\n",
       "      <th>0 kg/h</th>\n",
       "      <th>100 kg/h</th>\n",
       "      <th>200 kg/h</th>\n",
       "      <th>300 kg/h</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>small</td>\n",
       "      <td>15</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>small</td>\n",
       "      <td>30</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>5.0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>small</td>\n",
       "      <td>50</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>6.0</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>small</td>\n",
       "      <td>80</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>0.0</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>small</td>\n",
       "      <td>100</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "      <td>13.0</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>big</td>\n",
       "      <td>40</td>\n",
       "      <td>5</td>\n",
       "      <td>9</td>\n",
       "      <td>14.0</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>big</td>\n",
       "      <td>50</td>\n",
       "      <td>7</td>\n",
       "      <td>14</td>\n",
       "      <td>22.0</td>\n",
       "      <td>39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>big</td>\n",
       "      <td>60</td>\n",
       "      <td>10</td>\n",
       "      <td>19</td>\n",
       "      <td>32.0</td>\n",
       "      <td>161</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>big</td>\n",
       "      <td>70</td>\n",
       "      <td>13</td>\n",
       "      <td>27</td>\n",
       "      <td>46.0</td>\n",
       "      <td>224</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>big</td>\n",
       "      <td>80</td>\n",
       "      <td>15</td>\n",
       "      <td>34</td>\n",
       "      <td>57.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  flow type  water flow / air flow  0 kg/h  100 kg/h  200 kg/h  300 kg/h\n",
       "0     small                     15       1         2       3.0         3\n",
       "1     small                     30       2         2       5.0         3\n",
       "2     small                     50       2         3       6.0         5\n",
       "3     small                     80       3         4       0.0         9\n",
       "4     small                    100       3         5      13.0        13\n",
       "5       big                     40       5         9      14.0        25\n",
       "6       big                     50       7        14      22.0        39\n",
       "7       big                     60      10        19      32.0       161\n",
       "8       big                     70      13        27      46.0       224\n",
       "9       big                     80      15        34      57.0         0"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# fill missing values with 0\n",
    "packed_column.fillna(0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "c3225c8f-11ce-4e42-aada-0995516a43f8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>flow type</th>\n",
       "      <th>water flow / air flow</th>\n",
       "      <th>0 kg/h</th>\n",
       "      <th>100 kg/h</th>\n",
       "      <th>200 kg/h</th>\n",
       "      <th>300 kg/h</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>small</td>\n",
       "      <td>15</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>small</td>\n",
       "      <td>30</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>5.0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>small</td>\n",
       "      <td>50</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>6.0</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>small</td>\n",
       "      <td>100</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "      <td>13.0</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>big</td>\n",
       "      <td>40</td>\n",
       "      <td>5</td>\n",
       "      <td>9</td>\n",
       "      <td>14.0</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>big</td>\n",
       "      <td>50</td>\n",
       "      <td>7</td>\n",
       "      <td>14</td>\n",
       "      <td>22.0</td>\n",
       "      <td>39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>big</td>\n",
       "      <td>60</td>\n",
       "      <td>10</td>\n",
       "      <td>19</td>\n",
       "      <td>32.0</td>\n",
       "      <td>161</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>big</td>\n",
       "      <td>70</td>\n",
       "      <td>13</td>\n",
       "      <td>27</td>\n",
       "      <td>46.0</td>\n",
       "      <td>224</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>big</td>\n",
       "      <td>80</td>\n",
       "      <td>15</td>\n",
       "      <td>34</td>\n",
       "      <td>57.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  flow type  water flow / air flow  0 kg/h  100 kg/h  200 kg/h  300 kg/h\n",
       "0     small                     15       1         2       3.0         3\n",
       "1     small                     30       2         2       5.0         3\n",
       "2     small                     50       2         3       6.0         5\n",
       "4     small                    100       3         5      13.0        13\n",
       "5       big                     40       5         9      14.0        25\n",
       "6       big                     50       7        14      22.0        39\n",
       "7       big                     60      10        19      32.0       161\n",
       "8       big                     70      13        27      46.0       224\n",
       "9       big                     80      15        34      57.0         0"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# dropping missing values\n",
    "packed_column.dropna()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "a7b4a53b-2858-460c-9240-3940fc1770fe",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>flow type</th>\n",
       "      <th>water flow / air flow</th>\n",
       "      <th>0 kg/h</th>\n",
       "      <th>100 kg/h</th>\n",
       "      <th>200 kg/h</th>\n",
       "      <th>300 kg/h</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>small</td>\n",
       "      <td>15</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>small</td>\n",
       "      <td>30</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>5.0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>small</td>\n",
       "      <td>50</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>6.0</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>small</td>\n",
       "      <td>80</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>22.0</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>small</td>\n",
       "      <td>100</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "      <td>13.0</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>big</td>\n",
       "      <td>40</td>\n",
       "      <td>5</td>\n",
       "      <td>9</td>\n",
       "      <td>14.0</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>big</td>\n",
       "      <td>50</td>\n",
       "      <td>7</td>\n",
       "      <td>14</td>\n",
       "      <td>22.0</td>\n",
       "      <td>39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>big</td>\n",
       "      <td>60</td>\n",
       "      <td>10</td>\n",
       "      <td>19</td>\n",
       "      <td>32.0</td>\n",
       "      <td>161</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>big</td>\n",
       "      <td>70</td>\n",
       "      <td>13</td>\n",
       "      <td>27</td>\n",
       "      <td>46.0</td>\n",
       "      <td>224</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>big</td>\n",
       "      <td>80</td>\n",
       "      <td>15</td>\n",
       "      <td>34</td>\n",
       "      <td>57.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  flow type  water flow / air flow  0 kg/h  100 kg/h  200 kg/h  300 kg/h\n",
       "0     small                     15       1         2       3.0         3\n",
       "1     small                     30       2         2       5.0         3\n",
       "2     small                     50       2         3       6.0         5\n",
       "3     small                     80       3         4      22.0         9\n",
       "4     small                    100       3         5      13.0        13\n",
       "5       big                     40       5         9      14.0        25\n",
       "6       big                     50       7        14      22.0        39\n",
       "7       big                     60      10        19      32.0       161\n",
       "8       big                     70      13        27      46.0       224\n",
       "9       big                     80      15        34      57.0         0"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# fill missing values with the mean of a specific column\n",
    "packed_column.fillna(packed_column['200 kg/h'].mean())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32b9b2a8-f3f8-4896-92dc-e167bb6f77a0",
   "metadata": {},
   "source": [
    "## 4. Creating DataFrames from lists and dictionaries\n",
    "\n",
    "DataFrames can be created from many different objects: they can be loaded from saved files, as we saw before, but they can also be created from lists and dictionaries, for example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "1ac47737-5cab-4c46-8a14-56ca05d9c660",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>column 1</th>\n",
       "      <th>column 2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   column 1  column 2\n",
       "0         0         2\n",
       "1         1         4\n",
       "2         2         6\n",
       "3         3         8\n",
       "4         4        10"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# create a df from lists\n",
    "df = pd.DataFrame()\n",
    "df['column 1'] = list(range(10))\n",
    "df['column 2'] = list(range(2, 21, 2))\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "54c70227-709f-4cdd-b479-2e5763581630",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>57</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>56</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>47</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   0   1  2\n",
       "0  3  57  9\n",
       "1  1  56  3\n",
       "2  3  47  0"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# create a df from a list of lists\n",
    "observations = [[3, 57, 9], [1, 56, 3], [3, 47, 0]]\n",
    "observations_df = pd.DataFrame(observations)\n",
    "observations_df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc4549e7-4fcc-46ee-8012-888832d0990c",
   "metadata": {},
   "source": [
    "Note that you usually want your rows and column names to be descriptive (so not just using the index of the row/column), this will also help you index and perform other operations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "2a33c91e-fe73-42da-b43e-83e2ddf38c5a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>temperature</th>\n",
       "      <td>30.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>pH</th>\n",
       "      <td>7.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>pressure</th>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                0\n",
       "temperature  30.0\n",
       "pH            7.0\n",
       "pressure      1.0"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# create a df from a dictionary\n",
    "dict_conditions = {\n",
    "    \"temperature\": 30,\n",
    "    \"pH\": 7.0,\n",
    "    \"pressure\": 1.0\n",
    "}\n",
    "\n",
    "df = pd.DataFrame.from_dict(dict_conditions, orient='index')\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "b5a57f9d-ec85-47bc-ae65-dfa38e1904f5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>index</th>\n",
       "      <th>0</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>temperature</td>\n",
       "      <td>30.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>pH</td>\n",
       "      <td>7.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>pressure</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         index     0\n",
       "0  temperature  30.0\n",
       "1           pH   7.0\n",
       "2     pressure   1.0"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# you can reset the index to have your keys as a column instead of the index\n",
    "df = df.reset_index()\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41daf6e1-d53b-49f1-821c-9c36765f579b",
   "metadata": {},
   "source": [
    "## 5. Groupby"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c6c7dd0-2327-4452-b29b-d32b95393ff1",
   "metadata": {},
   "source": [
    "A very useful that you can do in pandas is to aggregate by a certain value across the dataframe and add rows that present the same value according to a specified logic.\n",
    "Read the official documentation [here](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "eb3bea3f-6132-4c16-8fc9-cf422d1a7fe1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>water flow / air flow</th>\n",
       "      <th>0 kg/h</th>\n",
       "      <th>100 kg/h</th>\n",
       "      <th>200 kg/h</th>\n",
       "      <th>300 kg/h</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>flow type</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>big</th>\n",
       "      <td>60.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>20.6</td>\n",
       "      <td>34.20</td>\n",
       "      <td>89.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>small</th>\n",
       "      <td>55.0</td>\n",
       "      <td>2.2</td>\n",
       "      <td>3.2</td>\n",
       "      <td>6.75</td>\n",
       "      <td>6.6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           water flow / air flow  0 kg/h  100 kg/h  200 kg/h  300 kg/h\n",
       "flow type                                                             \n",
       "big                         60.0    10.0      20.6     34.20      89.8\n",
       "small                       55.0     2.2       3.2      6.75       6.6"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# group dataframe by the flow type and calculate the average water flow\n",
    "packed_column.groupby(['flow type']).mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "42524b5f-0807-49ea-b8d1-8941578abffc",
   "metadata": {},
   "source": [
    "We can of course save the new aggregated dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "bdc54384-c1f4-4b8b-be7f-868ad022c69d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>water flow / air flow</th>\n",
       "      <th>0 kg/h</th>\n",
       "      <th>100 kg/h</th>\n",
       "      <th>200 kg/h</th>\n",
       "      <th>300 kg/h</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>flow type</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>big</th>\n",
       "      <td>60.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>20.6</td>\n",
       "      <td>34.20</td>\n",
       "      <td>89.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>small</th>\n",
       "      <td>55.0</td>\n",
       "      <td>2.2</td>\n",
       "      <td>3.2</td>\n",
       "      <td>6.75</td>\n",
       "      <td>6.6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           water flow / air flow  0 kg/h  100 kg/h  200 kg/h  300 kg/h\n",
       "flow type                                                             \n",
       "big                         60.0    10.0      20.6     34.20      89.8\n",
       "small                       55.0     2.2       3.2      6.75       6.6"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "aggr_packed_column = packed_column.groupby(['flow type']).mean()\n",
    "aggr_packed_column"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "30f60eb1-8f40-4512-8f54-e61336e7d56b",
   "metadata": {},
   "source": [
    "What if we want different metrics in different columns?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "3a78cffb-9609-4157-a430-194e0ec89d07",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>water flow / air flow</th>\n",
       "      <th>0 kg/h</th>\n",
       "      <th>100 kg/h</th>\n",
       "      <th>200 kg/h</th>\n",
       "      <th>300 kg/h</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>flow type</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>big</th>\n",
       "      <td>300</td>\n",
       "      <td>[5, 7, 10, 13, 15]</td>\n",
       "      <td>20.6</td>\n",
       "      <td>57.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>small</th>\n",
       "      <td>275</td>\n",
       "      <td>[1, 2, 2, 3, 3]</td>\n",
       "      <td>3.2</td>\n",
       "      <td>13.0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           water flow / air flow              0 kg/h  100 kg/h  200 kg/h  \\\n",
       "flow type                                                                  \n",
       "big                          300  [5, 7, 10, 13, 15]      20.6      57.0   \n",
       "small                        275     [1, 2, 2, 3, 3]       3.2      13.0   \n",
       "\n",
       "           300 kg/h  \n",
       "flow type            \n",
       "big               0  \n",
       "small             3  "
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "aggr_packed_column = packed_column.groupby(['flow type']).agg({'water flow / air flow': 'sum',\n",
    "                                                              '0 kg/h': list,\n",
    "                                                               '100 kg/h': 'mean',\n",
    "                                                               '200 kg/h': 'max',\n",
    "                                                               '300 kg/h': 'min'})\n",
    "aggr_packed_column"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bcc8e797-d9df-4d3b-bb2c-26713112d4ce",
   "metadata": {},
   "source": [
    "## 6. Handle duplicates"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9422601b-94da-4b3f-9862-3cbe047c0ab4",
   "metadata": {},
   "source": [
    "A common step in data pre-processing is handling duplicates.\n",
    "Depending on the objective, there are multiple ways to handle duplicate rows: they can be aggregated, removed and so on.\n",
    "\n",
    "Here we will look into how to check whether there are any duplicated rows or values in a column and how to drop these values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "bd1ab91c-eeca-4d71-ad6d-7582a8ea3fc4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    False\n",
       "1    False\n",
       "2    False\n",
       "3    False\n",
       "4    False\n",
       "5    False\n",
       "6    False\n",
       "7    False\n",
       "8    False\n",
       "9    False\n",
       "dtype: bool"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# check if there are duplicates in the dataframe\n",
    "packed_column.duplicated()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "ada9a4d0-e4ce-4de0-a7e5-d87eef2c40b4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    False\n",
       "1     True\n",
       "2     True\n",
       "3     True\n",
       "4     True\n",
       "5    False\n",
       "6     True\n",
       "7     True\n",
       "8     True\n",
       "9     True\n",
       "dtype: bool"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# check if there are duplicates in a specific column\n",
    "packed_column.duplicated(subset=['flow type'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "a1a0d8f3-5938-4d57-898c-0c8a1455d290",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>flow type</th>\n",
       "      <th>water flow / air flow</th>\n",
       "      <th>0 kg/h</th>\n",
       "      <th>100 kg/h</th>\n",
       "      <th>200 kg/h</th>\n",
       "      <th>300 kg/h</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>small</td>\n",
       "      <td>15</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>small</td>\n",
       "      <td>30</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>5.0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>small</td>\n",
       "      <td>50</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>6.0</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>small</td>\n",
       "      <td>80</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>small</td>\n",
       "      <td>100</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "      <td>13.0</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>big</td>\n",
       "      <td>40</td>\n",
       "      <td>5</td>\n",
       "      <td>9</td>\n",
       "      <td>14.0</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>big</td>\n",
       "      <td>50</td>\n",
       "      <td>7</td>\n",
       "      <td>14</td>\n",
       "      <td>22.0</td>\n",
       "      <td>39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>big</td>\n",
       "      <td>60</td>\n",
       "      <td>10</td>\n",
       "      <td>19</td>\n",
       "      <td>32.0</td>\n",
       "      <td>161</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>big</td>\n",
       "      <td>70</td>\n",
       "      <td>13</td>\n",
       "      <td>27</td>\n",
       "      <td>46.0</td>\n",
       "      <td>224</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>big</td>\n",
       "      <td>80</td>\n",
       "      <td>15</td>\n",
       "      <td>34</td>\n",
       "      <td>57.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  flow type  water flow / air flow  0 kg/h  100 kg/h  200 kg/h  300 kg/h\n",
       "0     small                     15       1         2       3.0         3\n",
       "1     small                     30       2         2       5.0         3\n",
       "2     small                     50       2         3       6.0         5\n",
       "3     small                     80       3         4       NaN         9\n",
       "4     small                    100       3         5      13.0        13\n",
       "5       big                     40       5         9      14.0        25\n",
       "6       big                     50       7        14      22.0        39\n",
       "7       big                     60      10        19      32.0       161\n",
       "8       big                     70      13        27      46.0       224\n",
       "9       big                     80      15        34      57.0         0"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# drop any duplicates - here the whole row needs to be the same to be dropped\n",
    "packed_column.drop_duplicates()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "cc824fb7-3cef-45ef-bfd9-b3cb6a13d76a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>flow type</th>\n",
       "      <th>water flow / air flow</th>\n",
       "      <th>0 kg/h</th>\n",
       "      <th>100 kg/h</th>\n",
       "      <th>200 kg/h</th>\n",
       "      <th>300 kg/h</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>small</td>\n",
       "      <td>100</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "      <td>13.0</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>big</td>\n",
       "      <td>80</td>\n",
       "      <td>15</td>\n",
       "      <td>34</td>\n",
       "      <td>57.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  flow type  water flow / air flow  0 kg/h  100 kg/h  200 kg/h  300 kg/h\n",
       "4     small                    100       3         5      13.0        13\n",
       "9       big                     80      15        34      57.0         0"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# drop any duplicates in a specified column and keep the last occurrence\n",
    "packed_column.drop_duplicates(subset=['flow type'], keep='last')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}