Introduction
In this article, we will cover pandas functions for statistical analysis which is one of the most important topics related to Data Science. This field helps us in understanding the intricate details of our data. The pandas functions that will be learned in this article are pandas mean(), median(), and mode(). So let’s start the article and learn about these functions.
Importing Pandas Library
First we will import the pandas library.
import pandas as pd
import numpy as np
Pandas Mean : mean()
The mean function of pandas helps us in finding the mean of the values on the specified axis.
Syntax
pandas.DataFrame.mean(axis=None, skipna=None, level=None, numeric_only=None, kwargs)
- axis : {index (0), columns (1)} – This is the axis where the function is applied.
- skipna : bool, default True – This is used for deciding whether to exclude NA/Null values or not.
- level : int or level name, default None – This parameter is generally used when we are deadling with multindex dataframe.
- numeric_only : bool, default None – This is used for deciding whether to include only float, int, boolean columns.
- kwags – Additional keyword arguments passed to the function.
Example 1: Simple example of Pandas Mean() function
In this example, the mean can be calculated over columns or rows.
df = pd.DataFrame({"P":[35, 9, 1, 78, 19],
"Q":[51, 45, 54, 30, 12],
"R":[24, 6, 75, 13, 83],
"S":[14, 41, 7, 25, 67]})
df
P | Q | R | S | |
---|---|---|---|---|
0 | 35 | 51 | 24 | 14 |
1 | 9 | 45 | 6 | 41 |
2 | 1 | 54 | 75 | 7 |
3 | 78 | 30 | 13 | 25 |
4 | 19 | 12 | 83 | 67 |
The default axis for finding mean is over the columns.
df.mean()
P 28.4 Q 38.4 R 40.2 S 30.8 dtype: float64
In the below instances,the axis parameter is passed to the mean function and we can see the difference in the results.
df.mean(axis=0)
P 28.4 Q 38.4 R 40.2 S 30.8 dtype: float64
df.mean(axis=1)
0 31.00 1 25.25 2 34.25 3 36.50 4 45.25 dtype: float64
Example 2: Using skipna parameter of Pandas Mean() function
Whenever a dataframe consists null/NaN values, then by using skipna parameter, we can skip those values and find the mean of the dataframe.
df = pd.DataFrame({"P":[35, 9, 1, 78, None],
"Q":[51, None, 54, 30, 12],
"R":[24, 6, None, 13, 83],
"S":[14, 41, 7, 25, None]})
df
P | Q | R | S | |
---|---|---|---|---|
0 | 35.0 | 51.0 | 24.0 | 14.0 |
1 | 9.0 | NaN | 6.0 | 41.0 |
2 | 1.0 | 54.0 | NaN | 7.0 |
3 | 78.0 | 30.0 | 13.0 | 25.0 |
4 | NaN | 12.0 | 83.0 | NaN |
df.mean(axis = 1, skipna = True)
0 31.000000 1 18.666667 2 20.666667 3 36.500000 4 47.500000 dtype: float64
[adrotate banner=”3″]
Pandas Median : median()
The median function of pandas helps us in finding the median of the values on the specified axis.
Syntax
pandas.DataFrame.median(axis=None, skipna=None, level=None, numeric_only=None, kwargs)
- axis : {index (0), columns (1)} – This is the axis where the function is applied.
- skipna : bool, default True – This is used for deciding whether to exclude NA/Null values or not.
- level : int or level name, default None – This parameter is generally used when we are deadling with multindex dataframe.
- numeric_only : bool, default None – This is used for deciding whether to include only float, int, boolean columns.
- kwags – Additional keyword arguments passed to the function.
Example 1: Finding median using pandas median() function
The dataframe that we have is used for finding the median.
df
P | Q | R | S | |
---|---|---|---|---|
0 | 35.0 | 51.0 | 24.0 | 14.0 |
1 | 9.0 | NaN | 6.0 | 41.0 |
2 | 1.0 | 54.0 | NaN | 7.0 |
3 | 78.0 | 30.0 | 13.0 | 25.0 |
4 | NaN | 12.0 | 83.0 | NaN |
df.median(axis=0)
P 22.0 Q 40.5 R 18.5 S 19.5 dtype: float64
Example 2: Finding median over column axis and using skipna parameter
In this example, median is calculated over column axis and skipna parameter is used for excluding the NULL values.
df.median(axis = 1, skipna = True)
0 29.5 1 9.0 2 7.0 3 27.5 4 47.5 dtype: float64
Pandas Mode : mode()
The mode function of pandas helps us in finding the mode of the values on the specified axis.
Syntax
pandas.DataFrame.mode(axis=None, skipna=None, level=None, numeric_only=None, kwargs)**
- axis : {index (0), columns (1)} – This is the axis where the function is applied.
- skipna : bool, default True – This is used for deciding whether to exclude NA/Null values or not.
- level : int or level name, default None – This parameter is generally used when we are deadling with multindex dataframe.
- numeric_only : bool, default None – This is used for deciding whether to include only float, int, boolean columns.
- kwags – Additional keyword arguments passed to the function.
Example 1: Finding mode using pandas mode() function
With the help of pandas mode function, we will find the mode of the dataframe.
df = pd.DataFrame([('Sedan', 80, 250),
('Hatchback', 90, 200),
('SUV', 80, 250),
('Sedan', 75, 150)],
index=('BMW', 'Mercedes', 'Jaguar', 'Bentley'),
columns=('car_name', 'speed', 'weight'))
df.mode()
car_name | speed | weight | |
---|---|---|---|
0 | Sedan | 80 | 250 |
Example 2: Using dropna parameter of pandas mode()
The dropna parameter of pandas mode() function is used in this example.
df_drop = pd.DataFrame([('Sedan', 80, np.nan),
('Hatchback', 90, 200),
('SUV', 80, np.nan),
('Sedan', 75, 150)],
index=('BMW', 'Mercedes', 'Jaguar', 'Bentley'),
columns=('car_name', 'speed', 'weight'))
df_drop.mode(dropna=False)
car_name | speed | weight | |
---|---|---|---|
0 | Sedan | 80 | NaN |
df_drop.mode(dropna=True)
car_name | speed | weight | |
---|---|---|---|
0 | Sedan | 80.0 | 150.0 |
1 | NaN | NaN | 200.0 |
Conclusion
We have reached to the end of this article, in this article we have covered pandas functions of statistics. These functions are mean(), median() and mode(). These statistical functions help in understanding the intricate details of our data. We have looked at the syntax and examples of these functions, this will assist in learning the usage of these functions.
- Also Read – Tutorial – Pandas Drop, Pandas Dropna, Pandas Drop Duplicate
- Also Read – Pandas Visualization Tutorial – Bar Plot, Histogram, Scatter Plot, Pie Chart
- Also Read – Tutorial – Pandas Concat, Pandas Append, Pandas Merge, Pandas Join
- Also Read – Pandas DataFrame Tutorial – Selecting Rows by Value, Iterrows and DataReader
Reference – https://pandas.pydata.org/docs/
-
I am Palash Sharma, an undergraduate student who loves to explore and garner in-depth knowledge in the fields like Artificial Intelligence and Machine Learning. I am captivated by the wonders these fields have produced with their novel implementations. With this, I have a desire to share my knowledge with others in all my capacity.
View all posts