Pandas Mathematical Functions – add(), sub(), mul(), div(), sum(), and agg()

Pandas Mathematical Functions - add(), sub(), mul(), div(), sum(), and agg()
Pandas Mathematical Functions - add(), sub(), mul(), div(), sum(), and agg()

Introduction

There are numerous instances while dealing with data science or machine learning tasks when we have to perform very basic mathematical operations. Pandas help in data handling and manipulation to a large extent, thus it is quite obvious that Pandas have functions for mathematical operations. So in this tutorial we will learn more about these pandas mathematical functions namely add(), sub(), mul(), div(), sum() and agg(). We will learn more about these pandas mathematical functions by looking at their syntax and examples.

Loading Pandas Library

We will commence this article by loading the pandas library.

In [1]:
import pandas as pd
import numpy as np

Pandas Addition : add()

The pandas addition function performs the addition of dataframes. The addition is performed element-wise.

Syntax

pandas.DataFrame.add(other, axis=’columns’, level=None, fill_value=None)

  • other: scalar, sequence, Series, or DataFrame – This parameter consists of any single or multiple element data structure or list-like object.
  • axis : {0 or ‘index’, 1 or ‘columns’} – This is used for deciding the axis on which the operation is applied.
  • level : int or label – The level parameter is used for broadcasting across a level and matching Index values on the passed MultiIndex level.
  • fill_value : float or None, default None – Whenever the dataframes have missing values, then to fill existing missing (NaN) values, we can use fill_value parameter.

The function will output the result of the addition function.

Ad
Deep Learning Specialization on Coursera

Example 1: Addition using pandas add()

In this example, we are adding value to all the elements of the dataframe.

In [2]:
df = pd.DataFrame({'speed': [80, 90, 110],
                        'weight': [250, 200, 150]},
                       index=['Audi', 'Jaguar', 'BMW'])
In [3]:
df
Out[3]:
speed weight
Audi 80 250
Jaguar 90 200
BMW 110 150
In [4]:
df.add(27)
Out[4]:
speed weight
Audi 107 277
Jaguar 117 227
BMW 137 177

Example 2: Addition of two dataframes

Here we have created another dataframe, which is then passed to the pandas add() function. We can see in the output that the two dataframes have been added.

In [5]:
df1 =  pd.DataFrame({'speed': [80, 90, 110],
                     'weight': [250, 200, 150]},
                    index=['Audi', 'Jaguar', 'BMW'])
In [6]:
df1
Out[6]:
speed weight
Audi 80 250
Jaguar 90 200
BMW 110 150
In [7]:
df.add(df1)
Out[7]:
speed weight
Audi 160 500
Jaguar 180 400
BMW 220 300

Example 3: Scalar addition using dataframe

Here in this example, the addition is performed using scalar value which is added to the existing dataframe.

In [8]:
df
Out[8]:
speed weight
Audi 80 250
Jaguar 90 200
BMW 110 150
In [9]:
df + 25
Out[9]:
speed weight
Audi 105 275
Jaguar 115 225
BMW 135 175

Pandas Subtract : sub()

The subtract function of pandas is used to perform subtract operation on dataframes.

Syntax

pandas.DataFrame.sub(other, axis=’columns’, level=None, fill_value=None)

  • other : scalar, sequence, Series, or DataFrame – This parameter consists any single or multiple element data structure, or list-like object.
  • axis : {0 or ‘index’, 1 or ‘columns’} – This is used for deciding the axis on which the operation is applied.
  • level : int or label – The level parameter is used for broadcasting across a level and matching Index values on the passed MultiIndex level.
  • fill_value : float or None, default None – Whenever the dataframes have missing values, then to fill existing missing (NaN) values, we can use fill_value parameter.

The function will output the result of subtract function.

Example 1: Subtraction using pandas sub()

In this example, an array is provided to the subtract function of pandas. The axis parameter is provided to specify the axis on which the operation is performed. We can see in the output that the values are decreasing in the dataframe.

In [10]:
df
Out[10]:
speed weight
Audi 80 250
Jaguar 90 200
BMW 110 150
In [11]:
df.sub([15, 30], axis='columns')
Out[11]:
speed weight
Audi 65 220
Jaguar 75 170
BMW 95 120

Example 2: Using series data along with pandas subtraction function

In the 2nd example, series data is passed to the pandas sub() function. By providing the correct index, the values specified in the series data is deducted from the dataframe.

In [12]:
df
Out[12]:
speed weight
Audi 80 250
Jaguar 90 200
BMW 110 150
In [13]:
df.sub(pd.Series([7, 9, 11], index=['Audi', 'Jaguar', 'BMW']),
        axis='index')
Out[13]:
speed weight
Audi 73 243
Jaguar 81 191
BMW 99 139

Example 3: Subtraction using scalar values

Here scalar values are used for performing subtraction operation without using pandas subtraction function.

In [14]:
df
Out[14]:
speed weight
Audi 80 250
Jaguar 90 200
BMW 110 150
In [15]:
df - [6, 9]
Out[15]:
speed weight
Audi 74 241
Jaguar 84 191
BMW 104 141

Pandas Multiply : mul()

The multiplication function of pandas is used to perform multiplication operations on dataframes.

Syntax

pandas.DataFrame.mul(other, axis=’columns’, level=None, fill_value=None)

  • other : scalar, sequence, Series, or DataFrame – This parameter consists any single or multiple element data structure, or list-like object.
  • axis : {0 or ‘index’, 1 or ‘columns’} – This is used for deciding the axis on which the operation is applied.
  • level : int or label – The level parameter is used for broadcasting across a level and matching Index values on the passed MultiIndex level.
  • fill_value : float or None, default None – Whenever the dataframes have missing values, then to fill existing missing (NaN) values, we can use fill_value parameter.

Example 1: Simple example of pandas multiplication() function

In [16]:
df1 = pd.DataFrame({'speed': [50, 75, 100]},
                    index=['Audi', 'Jaguar', 'BMW'])
In [17]:
df1
Out[17]:
speed
Audi 50
Jaguar 75
BMW 100
In [18]:
res = df.mul(df1)
In [19]:
df
Out[19]:
speed weight
Audi 80 250
Jaguar 90 200
BMW 110 150
In [20]:
res
Out[20]:
speed weight
Audi 4000 NaN
Jaguar 6750 NaN
BMW 11000 NaN

Example 2: Understanding use of fill_value parameter

Here the missing values of the new dataframe is filled with the help of fill_value parameter.

In [21]:
df
Out[21]:
speed weight
Audi 80 250
Jaguar 90 200
BMW 110 150
In [22]:
df.mul(df1, fill_value=0)
Out[22]:
speed weight
Audi 4000 0.0
Jaguar 6750 0.0
BMW 11000 0.0

Example 3: Perform multiplication operation using “*”

We can perform multiplication operation using the asterisk symbol as well. We can see that the output generated through this method and through mul() operation is same.

In [23]:
df
Out[23]:
speed weight
Audi 80 250
Jaguar 90 200
BMW 110 150
In [24]:
df1
Out[24]:
speed
Audi 50
Jaguar 75
BMW 100
In [25]:
df * df1
Out[25]:
speed weight
Audi 4000 NaN
Jaguar 6750 NaN
BMW 11000 NaN

Pandas Division : div()

The division function of pandas is used to perform division operation on dataframes.

Syntax

pandas.DataFrame.div(other, axis=’columns’, level=None, fill_value=None)

  • other : scalar, sequence, Series, or DataFrame – This parameter consists any single or multiple element data structure, or list-like object.
  • axis : {0 or ‘index’, 1 or ‘columns’} – This is used for deciding the axis on which the operation is applied.
  • level : int or label – The level parameter is used for broadcasting across a level and matching Index values on the passed MultiIndex level.
  • fill_value : float or None, default None – Whenever the dataframes have missing values, then to fill existing missing (NaN) values, we can use fill_value parameter.

Example 1: Using pandas div() function

To learn more about the div() function in pandas, we will look at this example where div() function is used to perform division operation over dataframes.

In [26]:
df
Out[26]:
speed weight
Audi 80 250
Jaguar 90 200
BMW 110 150
In [27]:
df.div(10)
Out[27]:
speed weight
Audi 8.0 25.0
Jaguar 9.0 20.0
BMW 11.0 15.0

Example 2: Using div() function on multindex dataframe

In this example, a multindex dataframe is created and then division function is passed this multindex dataframe.

In [28]:
df_multindex = pd.DataFrame({'no_gears': [4, 5, 3, 3, 8, 6],
                             'Speed': [360, 180, 360, 360, 540, 720]},
                              index=[['Sedan', 'Sedan', 'Sedan', 'Hatchback', 'Hatchback', 'Hatchback'],
                                     ['BMW', 'Audi', 'Bentley',
                                    'Mercedes', 'Jaguar', 'Mini Cooper']])
In [29]:
df_multindex
Out[29]:
no_gears Speed
Sedan BMW 4 360
Audi 5 180
Bentley 3 360
Hatchback Mercedes 3 360
Jaguar 8 540
Mini Cooper 6 720
In [30]:
df
Out[30]:
speed weight
Audi 80 250
Jaguar 90 200
BMW 110 150
In [31]:
df.div(df_multindex, level=1, fill_value=2)
Out[31]:
Speed no_gears speed weight
Sedan BMW 0.005556 0.500000 55.0 75.0
Audi 0.011111 0.400000 40.0 125.0
Bentley 0.005556 0.666667 NaN NaN
Hatchback Mercedes 0.005556 0.666667 NaN NaN
Jaguar 0.003704 0.250000 45.0 100.0
Mini Cooper 0.002778 0.333333 NaN NaN

Pandas Sum : sum()

The sum function helps in finding the sum of the values for desired axis.

Syntax

pandas.DataFrame.sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, kwargs)

  • axis : {index (0), columns (1)} – This is the axis where the function is applied.
  • skipna : bool, default True – It is used to decide whether the NA/Null values should be dropped/skipped or not while computation.
  • level : int or level name, default None – It used for deciding the level, generally in case of multindex dataframes.
  • numeric_only : bool,default None – It used to decide whether to include only float, int, boolean columns. If None, will attempt to use everything
  • min_count : int,default 0 – The required number of valid values to perform the operation.
  • kwargs : Additional Arguments.

Example 1: Using sum function with multindex dataframe

In [32]:
df_sum = pd.MultiIndex.from_arrays([
     ['Sedan', 'Hatchback', 'Sedan', 'Hatchback'],
    ['BMW', 'Mini Cooper', 'Audi', 'Aston Martin']],
  names=['designs', 'companies'])
In [33]:
cars = pd.Series([3, 6, 9, 18], name='types_of_Cars', index=df_sum)
In [34]:
cars.sum()
Out[34]:
36

Example 2: Using sum function with level parameter

Here since we have multindex dataframe, therefore we can perform sum function using level parameter. In this example, we can see how the levels are used in sum() function of pandas.

In [35]:
cars.sum(level='designs')
Out[35]:
designs
Sedan        12
Hatchback    24
Name: types_of_Cars, dtype: int64
In [36]:
cars.sum(level=0)
Out[36]:
designs
Sedan        12
Hatchback    24
Name: types_of_Cars, dtype: int64
In [37]:
cars.sum(level=1)
Out[37]:
companies
BMW              3
Mini Cooper      6
Audi             9
Aston Martin    18
Name: types_of_Cars, dtype: int64

Pandas Aggregate: agg()

The pandas aggregate function is used to aggregate using one or more operations over desired axis.

Syntax

pandas.dataframe.agg(func, axis=0, *args, kwargs)

  • func : function, str, list or dict – This is the function used for aggregating the data.
  • axis : {0 or ‘index’, 1 or ‘columns’}, default 0 – The axis over which the operation is applied.
  • args : These are the positional arguments to pass to func.
  • kwargs : Additional keyword arguments.

Example 1: Using pandas aggregate functions over rows

Here a dataframe is created first and then different operations are applied using aggregate function of pandas.

In [38]:
df = pd.DataFrame([[15, 22, 37],
                    [49, np.nan, 64],
                    [np.nan, 89, 99],
                    [53, np.nan,71]],
                   columns=['P', 'Q', 'R'])
In [39]:
df
Out[39]:
P Q R
0 15.0 22.0 37
1 49.0 NaN 64
2 NaN 89.0 99
3 53.0 NaN 71

Here sum and minimum value for each column is calculated using pandas agg() function.

In [40]:
df.agg(['sum', 'min'])
Out[40]:
P Q R
sum 117.0 111.0 271
min 15.0 22.0 37

Example 2: Using different agg() functions on each column

In this example, different types of functions are applied over different columns.

In [41]:
df
Out[41]:
P Q R
0 15.0 22.0 37
1 49.0 NaN 64
2 NaN 89.0 99
3 53.0 NaN 71
In [42]:
df.agg({'P' : ['sum', 'min'], 'Q' : ['min', 'max']})
Out[42]:
P Q
max NaN 89.0
min 15.0 22.0
sum 117.0 NaN

Example 3: Aggregating over columns

Here the aggregate function is applied over columns. We can specify the operation and the axis on which it has to be performed.

In [43]:
df
Out[43]:
P Q R
0 15.0 22.0 37
1 49.0 NaN 64
2 NaN 89.0 99
3 53.0 NaN 71
In [44]:
df.agg("mean", axis="columns")
Out[44]:
0    24.666667
1    56.500000
2    94.000000
3    62.000000
dtype: float64

Conclusion

Reaching to the end of this article, we learned about various mathematical operations like add(), sub(), mul(), div(), sum() and agg(). These basic mathematical operations can be performed easily with the help of pandas library. Since we deal with mathematical tasks in our data science interactions, these pandas operations will prove to very handy.

Reference – https://pandas.pydata.org/docs/

Like and Comment section (Community Members)

Create Your ML Profile!

Don't miss out to join exclusive Machine Learning community

Comments

No comments yet