Pandas Statistical Functions Part 1 – mean(), median(), and mode()

Introduction

In this article, we will cover pandas functions for statistical analysis which is one of the most important topics related to Data Science. This field helps us in understanding the intricate details of our data. The pandas functions that will be learned in this article are pandas mean(), median(), and mode(). So let’s start the article and learn about these functions.

Importing Pandas Library

First we will import the pandas library.

In [1]:
import pandas as pd
import numpy as np

Pandas Mean : mean()

The mean function of pandas helps us in finding the mean of the values on the specified axis.

Syntax

pandas.DataFrame.mean(axis=None, skipna=None, level=None, numeric_only=None, kwargs)

  • axis : {index (0), columns (1)} – This is the axis where the function is applied.
  • skipna : bool, default True – This is used for deciding whether to exclude NA/Null values or not.
  • level : int or level name, default None – This parameter is generally used when we are deadling with multindex dataframe.
  • numeric_only : bool, default None – This is used for deciding whether to include only float, int, boolean columns.
  • kwags – Additional keyword arguments passed to the function.

Example 1: Simple example of Pandas Mean() function

In this example, the mean can be calculated over columns or rows.

In [2]:
df = pd.DataFrame({"P":[35, 9, 1, 78, 19], 
                   "Q":[51, 45, 54, 30, 12],  
                   "R":[24, 6, 75, 13, 83], 
                   "S":[14, 41, 7, 25, 67]})   
In [3]:
df
Out[3]:
P Q R S
0 35 51 24 14
1 9 45 6 41
2 1 54 75 7
3 78 30 13 25
4 19 12 83 67

The default axis for finding mean is over the columns.

In [4]:
df.mean()
Out[4]:
P    28.4
Q    38.4
R    40.2
S    30.8
dtype: float64

In the below instances,the axis parameter is passed to the mean function and we can see the difference in the results.

In [5]:
df.mean(axis=0)
Out[5]:
P    28.4
Q    38.4
R    40.2
S    30.8
dtype: float64
In [6]:
df.mean(axis=1)
Out[6]:
0    31.00
1    25.25
2    34.25
3    36.50
4    45.25
dtype: float64

Example 2: Using skipna parameter of Pandas Mean() function

Whenever a dataframe consists null/NaN values, then by using skipna parameter, we can skip those values and find the mean of the dataframe.

In [7]:
df = pd.DataFrame({"P":[35, 9, 1, 78, None], 
                   "Q":[51, None, 54, 30, 12],  
                   "R":[24, 6, None, 13, 83], 
                   "S":[14, 41, 7, 25, None]})   
In [8]:
df
Out[8]:
P Q R S
0 35.0 51.0 24.0 14.0
1 9.0 NaN 6.0 41.0
2 1.0 54.0 NaN 7.0
3 78.0 30.0 13.0 25.0
4 NaN 12.0 83.0 NaN
In [9]:
df.mean(axis = 1, skipna = True) 
Out[9]:
0    31.000000
1    18.666667
2    20.666667
3    36.500000
4    47.500000
dtype: float64

[adrotate banner=”3″]

Pandas Median : median()

The median function of pandas helps us in finding the median of the values on the specified axis.

Syntax

pandas.DataFrame.median(axis=None, skipna=None, level=None, numeric_only=None, kwargs)

  • axis : {index (0), columns (1)} – This is the axis where the function is applied.
  • skipna : bool, default True – This is used for deciding whether to exclude NA/Null values or not.
  • level : int or level name, default None – This parameter is generally used when we are deadling with multindex dataframe.
  • numeric_only : bool, default None – This is used for deciding whether to include only float, int, boolean columns.
  • kwags – Additional keyword arguments passed to the function.

Example 1: Finding median using pandas median() function

The dataframe that we have is used for finding the median.

In [10]:
df
Out[10]:
P Q R S
0 35.0 51.0 24.0 14.0
1 9.0 NaN 6.0 41.0
2 1.0 54.0 NaN 7.0
3 78.0 30.0 13.0 25.0
4 NaN 12.0 83.0 NaN
In [11]:
df.median(axis=0)
Out[11]:
P    22.0
Q    40.5
R    18.5
S    19.5
dtype: float64

Example 2: Finding median over column axis and using skipna parameter

In this example, median is calculated over column axis and skipna parameter is used for excluding the NULL values.

In [12]:
df.median(axis = 1, skipna = True) 
Out[12]:
0    29.5
1     9.0
2     7.0
3    27.5
4    47.5
dtype: float64

Pandas Mode : mode()

The mode function of pandas helps us in finding the mode of the values on the specified axis.

Syntax

pandas.DataFrame.mode(axis=None, skipna=None, level=None, numeric_only=None, kwargs)**

  • axis : {index (0), columns (1)} – This is the axis where the function is applied.
  • skipna : bool, default True – This is used for deciding whether to exclude NA/Null values or not.
  • level : int or level name, default None – This parameter is generally used when we are deadling with multindex dataframe.
  • numeric_only : bool, default None – This is used for deciding whether to include only float, int, boolean columns.
  • kwags – Additional keyword arguments passed to the function.

Example 1: Finding mode using pandas mode() function

With the help of pandas mode function, we will find the mode of the dataframe.

In [13]:
df = pd.DataFrame([('Sedan', 80, 250),
                    ('Hatchback', 90, 200),
                    ('SUV', 80, 250),
                    ('Sedan', 75, 150)],
                   index=('BMW', 'Mercedes', 'Jaguar', 'Bentley'),
                   columns=('car_name', 'speed', 'weight'))
In [14]:
df.mode()
Out[14]:
car_name speed weight
0 Sedan 80 250

Example 2: Using dropna parameter of pandas mode()

The dropna parameter of pandas mode() function is used in this example.

In [15]:
df_drop = pd.DataFrame([('Sedan', 80, np.nan),
                    ('Hatchback', 90, 200),
                    ('SUV', 80, np.nan),
                    ('Sedan', 75, 150)],
                   index=('BMW', 'Mercedes', 'Jaguar', 'Bentley'),
                   columns=('car_name', 'speed', 'weight'))
In [16]:
df_drop.mode(dropna=False)
Out[16]:
car_name speed weight
0 Sedan 80 NaN
In [17]:
df_drop.mode(dropna=True)
Out[17]:
car_name speed weight
0 Sedan 80.0 150.0
1 NaN NaN 200.0

Conclusion

We have reached to the end of this article, in this article we have covered pandas functions of statistics. These functions are mean(), median() and mode(). These statistical functions help in understanding the intricate details of our data. We have looked at the syntax and examples of these functions, this will assist in learning the usage of these functions.

Reference – https://pandas.pydata.org/docs/

  • Palash Sharma

    I am Palash Sharma, an undergraduate student who loves to explore and garner in-depth knowledge in the fields like Artificial Intelligence and Machine Learning. I am captivated by the wonders these fields have produced with their novel implementations. With this, I have a desire to share my knowledge with others in all my capacity.

    View all posts

Follow Us

Leave a Reply

Your email address will not be published. Required fields are marked *