Pandas Analytical Functions – min() , max() , and pivot table()

Introduction

In this article, we will cover pandas analytical functions of pandas min(), max(), and pivot table(). These functions are very useful in the exploratory data analysis phase of any machine learning or data science projects. We will understand these functions with the help of syntaxes and examples for better clarity.

Importing Pandas Library

We will first load the pandas library

In [1]:
import pandas as pd
import numpy as np

Pandas Min : Min()

The min function of pandas helps us in finding the minimum values on specified axis.

Syntax

pandas.DataFrame.min(axis=None, skipna=None, level=None, numeric_only=None, kwargs)

  • axis : {index (0), columns (1)} – This is the axis where the function is applied.
  • skipna : bool, default True – This is used for deciding whether to exclude NA/Null values or not.
  • level : int or level name, default None – This parameter is generally used when we are deadling with multindex dataframe.
  • numeric_only : bool, default None – This is used for deciding whether to include only float, int, boolean columns.
  • kwags – Additional keyword arguments passed to the function.

Example 1: Simple example of pandas min() function

Here the pandas min() function is used for finding the minimum value of specified axis.

In [2]:
idx = pd.MultiIndex.from_arrays([
        ['Sedan', 'SUV', 'SUV', 'Sedan'],
        ['BMW', 'Audi', 'Mini Cooper', 'Mercedes']],
         names=['designs', 'companies'])
In [3]:
s = pd.Series([16, 4, 12, 8], name='car_types', index=idx)
In [4]:
s
Out[4]:
designs  companies  
Sedan    BMW            16
SUV      Audi            4
         Mini Cooper    12
Sedan    Mercedes        8
Name: car_types, dtype: int64
In [5]:
s.min()
Out[5]:
4

Example 2: Using level parameter of pandas min() function

In this example of pandas min() function, we will be using level parameter. Here the minimum value of each level is shown in the output.

In [6]:
s.min(level='designs')
Out[6]:
designs
Sedan    8
SUV      4
Name: car_types, dtype: int64
In [7]:
s.min(level=0)
Out[7]:
designs
Sedan    8
SUV      4
Name: car_types, dtype: int64
[adrotate banner=”3″]

Pandas Max : Max()

The max function of pandas helps us in finding the maximum values on specified axis.

Syntax

pandas.DataFrame.min(axis=None, skipna=None, level=None, numeric_only=None, kwargs)

  • axis : {index (0), columns (1)} – This is the axis where the function is applied.
  • skipna : bool, default True – This is used for deciding whether to exclude NA/Null values or not.
  • level : int or level name, default None – This parameter is generally used when we are deadling with multindex dataframe.
  • numeric_only : bool, default None – This is used for deciding whether to include only float, int, boolean columns.
  • kwags – Additional keyword arguments passed to the function.

Example 1: Simple example of pandas max() function

Here the pandas min() function is used for finding the maximum value of the specified axis.

In [8]:
idx = pd.MultiIndex.from_arrays([
        ['Sedan', 'SUV', 'SUV', 'Sedan'],
        ['BMW', 'Audi', 'Mini Cooper', 'Mercedes']],
         names=['designs', 'companies'])
In [9]:
s = pd.Series([16, 4, 12, 8], name='car_types', index=idx)
In [10]:
s
Out[10]:
designs  companies  
Sedan    BMW            16
SUV      Audi            4
         Mini Cooper    12
Sedan    Mercedes        8
Name: car_types, dtype: int64
In [11]:
s.max()
Out[11]:
16

Example 2: Using level parameter of pandas max() function

In this example of pandas max() function, we will be using level parameter. Here the maximum value of each level is shown in the output.

In [12]:
s.max(level='designs')
Out[12]:
designs
Sedan    16
SUV      12
Name: car_types, dtype: int64
In [13]:
s.max(level=0)
Out[13]:
designs
Sedan    16
SUV      12
Name: car_types, dtype: int64

Pandas Pivot Table : Pivot_Table()

The pandas pivot table function helps in creating a spreadsheet-style pivot table as a DataFrame.

Syntax

pandas.DataFrame.pivot_table(data, values, index, columns, aggfunc, fill_value, margins, dropna, margins_name, observed)

  • data : DataFrame – This is the data which is required to be arranged in pivot table
  • values : column to aggregate – Here the values which aggregated in the column is provided.
  • index,columns : column, Grouper, array, or list of the previous – Here the index and columns are used to specify the column values.
  • aggfunc : function, list of functions, dict, default numpy.mean – It consists the functions or list of functions which are used.
  • fill_value : scalar – This is the value which can replace the missing values
  • margins : bool, default True – For adding all row/columns.
  • dropna : bool, default True – Do not include values whose entries are NaN
  • margin_name : str, default True – Here, name of the row / column that will contain the totals when margins is True is contained.
  • observed : bool, default False – This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

The function returns an excel style pivot table.

Example 1: Using pandas pivot table to compute aggregated sum

Here the pandas pivot table is used to compute the aggregated sum.

In [14]:
df = pd.DataFrame({"A": ["Tom", "Tom", "Tom", "Tom", "Tom",
                          "Tim", "Tim", "Tim", "Tim"],
                     "B": ["one", "one", "one", "two", "two",
                          "one", "one", "two", "two"],
                     "C": ["red", "blue", "blue", "red",
                         "red", "blue", "red", "red",
                          "blue"],
                     "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
                     "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
In [15]:
df
Out[15]:
A B C D E
0 Tom one red 1 2
1 Tom one blue 2 4
2 Tom one blue 2 5
3 Tom two red 3 5
4 Tom two red 3 6
5 Tim one blue 4 6
6 Tim one red 5 8
7 Tim two red 6 9
8 Tim two blue 7 9
In [16]:
table = pd.pivot_table(df, values='D', index=['A', 'B'],
                     columns=['C'], aggfunc=np.sum)
In [17]:
table
Out[17]:
C blue red
A B
Tim one 4.0 5.0
two 7.0 6.0
Tom one 4.0 1.0
two NaN 6.0

Example 2: Using fill_value parameter to fill the NaN values

Here in this example fill_parameter is used to fill the missing values. As we can see the NaN values are filled with “0”.

In [18]:
table = pd.pivot_table(df, values='D', index=['A', 'B'],
                     columns=['C'], aggfunc=np.sum, fill_value=0)
In [19]:
table
Out[19]:

C blue red
A B
Tim one 4 5
two 7 6
Tom one 4 1
two 0 6

Example 3: Calculating mean across multiple columns

In this last example, mean is computed for multiple columns.

In [20]:
table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
                     aggfunc={'D': np.mean,
                              'E': np.mean})
In [21]:
table
Out[21]:
D E
A C
Tim blue 5.500000 7.500000
red 5.500000 8.500000
Tom blue 2.000000 4.500000
red 2.333333 4.333333

Conclusion

Reaching to the end of this article, in this tutorial we learned about pandas functions like min(), max(), and pivot_table(). We learned about the syntax and examples of these functions and this helped us in understanding their usage.

Reference – https://pandas.pydata.org/docs/

  • Palash Sharma

    I am Palash Sharma, an undergraduate student who loves to explore and garner in-depth knowledge in the fields like Artificial Intelligence and Machine Learning. I am captivated by the wonders these fields have produced with their novel implementations. With this, I have a desire to share my knowledge with others in all my capacity.

Follow Us

Leave a Reply

Your email address will not be published. Required fields are marked *