Introduction
In this article, we will cover pandas analytical functions of pandas min(), max(), and pivot table(). These functions are very useful in the exploratory data analysis phase of any machine learning or data science projects. We will understand these functions with the help of syntaxes and examples for better clarity.
Importing Pandas Library
We will first load the pandas library
import pandas as pd
import numpy as np
Pandas Min : Min()
The min function of pandas helps us in finding the minimum values on specified axis.
Syntax
pandas.DataFrame.min(axis=None, skipna=None, level=None, numeric_only=None, kwargs)
- axis : {index (0), columns (1)} – This is the axis where the function is applied.
- skipna : bool, default True – This is used for deciding whether to exclude NA/Null values or not.
- level : int or level name, default None – This parameter is generally used when we are deadling with multindex dataframe.
- numeric_only : bool, default None – This is used for deciding whether to include only float, int, boolean columns.
- kwags – Additional keyword arguments passed to the function.
Example 1: Simple example of pandas min() function
Here the pandas min() function is used for finding the minimum value of specified axis.
idx = pd.MultiIndex.from_arrays([
['Sedan', 'SUV', 'SUV', 'Sedan'],
['BMW', 'Audi', 'Mini Cooper', 'Mercedes']],
names=['designs', 'companies'])
s = pd.Series([16, 4, 12, 8], name='car_types', index=idx)
s
designs companies Sedan BMW 16 SUV Audi 4 Mini Cooper 12 Sedan Mercedes 8 Name: car_types, dtype: int64
s.min()
4
Example 2: Using level parameter of pandas min() function
In this example of pandas min() function, we will be using level parameter. Here the minimum value of each level is shown in the output.
s.min(level='designs')
designs Sedan 8 SUV 4 Name: car_types, dtype: int64
s.min(level=0)
designs Sedan 8 SUV 4 Name: car_types, dtype: int64
Pandas Max : Max()
The max function of pandas helps us in finding the maximum values on specified axis.
Syntax
pandas.DataFrame.min(axis=None, skipna=None, level=None, numeric_only=None, kwargs)
- axis : {index (0), columns (1)} – This is the axis where the function is applied.
- skipna : bool, default True – This is used for deciding whether to exclude NA/Null values or not.
- level : int or level name, default None – This parameter is generally used when we are deadling with multindex dataframe.
- numeric_only : bool, default None – This is used for deciding whether to include only float, int, boolean columns.
- kwags – Additional keyword arguments passed to the function.
Example 1: Simple example of pandas max() function
Here the pandas min() function is used for finding the maximum value of the specified axis.
idx = pd.MultiIndex.from_arrays([
['Sedan', 'SUV', 'SUV', 'Sedan'],
['BMW', 'Audi', 'Mini Cooper', 'Mercedes']],
names=['designs', 'companies'])
s = pd.Series([16, 4, 12, 8], name='car_types', index=idx)
s
designs companies Sedan BMW 16 SUV Audi 4 Mini Cooper 12 Sedan Mercedes 8 Name: car_types, dtype: int64
s.max()
16
Example 2: Using level parameter of pandas max() function
In this example of pandas max() function, we will be using level parameter. Here the maximum value of each level is shown in the output.
s.max(level='designs')
designs Sedan 16 SUV 12 Name: car_types, dtype: int64
s.max(level=0)
designs Sedan 16 SUV 12 Name: car_types, dtype: int64
Pandas Pivot Table : Pivot_Table()
The pandas pivot table function helps in creating a spreadsheet-style pivot table as a DataFrame.
Syntax
pandas.DataFrame.pivot_table(data, values, index, columns, aggfunc, fill_value, margins, dropna, margins_name, observed)
- data : DataFrame – This is the data which is required to be arranged in pivot table
- values : column to aggregate – Here the values which aggregated in the column is provided.
- index,columns : column, Grouper, array, or list of the previous – Here the index and columns are used to specify the column values.
- aggfunc : function, list of functions, dict, default numpy.mean – It consists the functions or list of functions which are used.
- fill_value : scalar – This is the value which can replace the missing values
- margins : bool, default True – For adding all row/columns.
- dropna : bool, default True – Do not include values whose entries are NaN
- margin_name : str, default True – Here, name of the row / column that will contain the totals when margins is True is contained.
- observed : bool, default False – This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.
The function returns an excel style pivot table.
Example 1: Using pandas pivot table to compute aggregated sum
Here the pandas pivot table is used to compute the aggregated sum.
df = pd.DataFrame({"A": ["Tom", "Tom", "Tom", "Tom", "Tom",
"Tim", "Tim", "Tim", "Tim"],
"B": ["one", "one", "one", "two", "two",
"one", "one", "two", "two"],
"C": ["red", "blue", "blue", "red",
"red", "blue", "red", "red",
"blue"],
"D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
"E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
df
A | B | C | D | E | |
---|---|---|---|---|---|
0 | Tom | one | red | 1 | 2 |
1 | Tom | one | blue | 2 | 4 |
2 | Tom | one | blue | 2 | 5 |
3 | Tom | two | red | 3 | 5 |
4 | Tom | two | red | 3 | 6 |
5 | Tim | one | blue | 4 | 6 |
6 | Tim | one | red | 5 | 8 |
7 | Tim | two | red | 6 | 9 |
8 | Tim | two | blue | 7 | 9 |
table = pd.pivot_table(df, values='D', index=['A', 'B'],
columns=['C'], aggfunc=np.sum)
table
C | blue | red | |
---|---|---|---|
A | B | ||
Tim | one | 4.0 | 5.0 |
two | 7.0 | 6.0 | |
Tom | one | 4.0 | 1.0 |
two | NaN | 6.0 |
Example 2: Using fill_value parameter to fill the NaN values
Here in this example fill_parameter is used to fill the missing values. As we can see the NaN values are filled with “0”.
table = pd.pivot_table(df, values='D', index=['A', 'B'],
columns=['C'], aggfunc=np.sum, fill_value=0)
table
C | blue | red | |
---|---|---|---|
A | B | ||
Tim | one | 4 | 5 |
two | 7 | 6 | |
Tom | one | 4 | 1 |
two | 0 | 6 |
Example 3: Calculating mean across multiple columns
In this last example, mean is computed for multiple columns.
table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
aggfunc={'D': np.mean,
'E': np.mean})
table
D | E | ||
---|---|---|---|
A | C | ||
Tim | blue | 5.500000 | 7.500000 |
red | 5.500000 | 8.500000 | |
Tom | blue | 2.000000 | 4.500000 |
red | 2.333333 | 4.333333 |
Conclusion
Reaching to the end of this article, in this tutorial we learned about pandas functions like min(), max(), and pivot_table(). We learned about the syntax and examples of these functions and this helped us in understanding their usage.
- Also Read – Tutorial – Pandas Drop, Pandas Dropna, Pandas Drop Duplicate
- Also Read – Pandas Visualization Tutorial – Bar Plot, Histogram, Scatter Plot, Pie Chart
- Also Read – Tutorial – Pandas Concat, Pandas Append, Pandas Merge, Pandas Join
- Also Read – Pandas DataFrame Tutorial – Selecting Rows by Value, Iterrows and DataReader
Reference – https://pandas.pydata.org/docs/
-
I am Palash Sharma, an undergraduate student who loves to explore and garner in-depth knowledge in the fields like Artificial Intelligence and Machine Learning. I am captivated by the wonders these fields have produced with their novel implementations. With this, I have a desire to share my knowledge with others in all my capacity.
View all posts