Introduction
There are numerous instances while dealing with data science or machine learning tasks when we have to perform very basic mathematical operations. Pandas help in data handling and manipulation to a large extent, thus it is quite obvious that Pandas have functions for mathematical operations. So in this tutorial we will learn more about these pandas mathematical functions namely add(), sub(), mul(), div(), sum() and agg(). We will learn more about these pandas mathematical functions by looking at their syntax and examples.
Loading Pandas Library
We will commence this article by loading the pandas library.
import pandas as pd
import numpy as np
Pandas Addition : add()
The pandas addition function performs the addition of dataframes. The addition is performed element-wise.
Syntax
pandas.DataFrame.add(other, axis=’columns’, level=None, fill_value=None)
- other: scalar, sequence, Series, or DataFrame – This parameter consists of any single or multiple element data structure or list-like object.
- axis : {0 or ‘index’, 1 or ‘columns’} – This is used for deciding the axis on which the operation is applied.
- level : int or label – The level parameter is used for broadcasting across a level and matching Index values on the passed MultiIndex level.
- fill_value : float or None, default None – Whenever the dataframes have missing values, then to fill existing missing (NaN) values, we can use fill_value parameter.
The function will output the result of the addition function.
Example 1: Addition using pandas add()
In this example, we are adding value to all the elements of the dataframe.
df = pd.DataFrame({'speed': [80, 90, 110],
'weight': [250, 200, 150]},
index=['Audi', 'Jaguar', 'BMW'])
df
speed | weight | |
---|---|---|
Audi | 80 | 250 |
Jaguar | 90 | 200 |
BMW | 110 | 150 |
df.add(27)
speed | weight | |
---|---|---|
Audi | 107 | 277 |
Jaguar | 117 | 227 |
BMW | 137 | 177 |
Example 2: Addition of two dataframes
Here we have created another dataframe, which is then passed to the pandas add() function. We can see in the output that the two dataframes have been added.
df1 = pd.DataFrame({'speed': [80, 90, 110],
'weight': [250, 200, 150]},
index=['Audi', 'Jaguar', 'BMW'])
df1
speed | weight | |
---|---|---|
Audi | 80 | 250 |
Jaguar | 90 | 200 |
BMW | 110 | 150 |
df.add(df1)
speed | weight | |
---|---|---|
Audi | 160 | 500 |
Jaguar | 180 | 400 |
BMW | 220 | 300 |
Example 3: Scalar addition using dataframe
Here in this example, the addition is performed using scalar value which is added to the existing dataframe.
df
speed | weight | |
---|---|---|
Audi | 80 | 250 |
Jaguar | 90 | 200 |
BMW | 110 | 150 |
df + 25
speed | weight | |
---|---|---|
Audi | 105 | 275 |
Jaguar | 115 | 225 |
BMW | 135 | 175 |
Pandas Subtract : sub()
The subtract function of pandas is used to perform subtract operation on dataframes.
Syntax
pandas.DataFrame.sub(other, axis=’columns’, level=None, fill_value=None)
- other : scalar, sequence, Series, or DataFrame – This parameter consists any single or multiple element data structure, or list-like object.
- axis : {0 or ‘index’, 1 or ‘columns’} – This is used for deciding the axis on which the operation is applied.
- level : int or label – The level parameter is used for broadcasting across a level and matching Index values on the passed MultiIndex level.
- fill_value : float or None, default None – Whenever the dataframes have missing values, then to fill existing missing (NaN) values, we can use fill_value parameter.
The function will output the result of subtract function.
Example 1: Subtraction using pandas sub()
In this example, an array is provided to the subtract function of pandas. The axis parameter is provided to specify the axis on which the operation is performed. We can see in the output that the values are decreasing in the dataframe.
df
speed | weight | |
---|---|---|
Audi | 80 | 250 |
Jaguar | 90 | 200 |
BMW | 110 | 150 |
df.sub([15, 30], axis='columns')
speed | weight | |
---|---|---|
Audi | 65 | 220 |
Jaguar | 75 | 170 |
BMW | 95 | 120 |
Example 2: Using series data along with pandas subtraction function
In the 2nd example, series data is passed to the pandas sub() function. By providing the correct index, the values specified in the series data is deducted from the dataframe.
df
speed | weight | |
---|---|---|
Audi | 80 | 250 |
Jaguar | 90 | 200 |
BMW | 110 | 150 |
df.sub(pd.Series([7, 9, 11], index=['Audi', 'Jaguar', 'BMW']),
axis='index')
speed | weight | |
---|---|---|
Audi | 73 | 243 |
Jaguar | 81 | 191 |
BMW | 99 | 139 |
Example 3: Subtraction using scalar values
Here scalar values are used for performing subtraction operation without using pandas subtraction function.
df
speed | weight | |
---|---|---|
Audi | 80 | 250 |
Jaguar | 90 | 200 |
BMW | 110 | 150 |
df - [6, 9]
speed | weight | |
---|---|---|
Audi | 74 | 241 |
Jaguar | 84 | 191 |
BMW | 104 | 141 |
[adrotate banner=”3″]
Pandas Multiply : mul()
The multiplication function of pandas is used to perform multiplication operations on dataframes.
Syntax
pandas.DataFrame.mul(other, axis=’columns’, level=None, fill_value=None)
- other : scalar, sequence, Series, or DataFrame – This parameter consists any single or multiple element data structure, or list-like object.
- axis : {0 or ‘index’, 1 or ‘columns’} – This is used for deciding the axis on which the operation is applied.
- level : int or label – The level parameter is used for broadcasting across a level and matching Index values on the passed MultiIndex level.
- fill_value : float or None, default None – Whenever the dataframes have missing values, then to fill existing missing (NaN) values, we can use fill_value parameter.
Example 1: Simple example of pandas multiplication() function
df1 = pd.DataFrame({'speed': [50, 75, 100]},
index=['Audi', 'Jaguar', 'BMW'])
df1
speed | |
---|---|
Audi | 50 |
Jaguar | 75 |
BMW | 100 |
res = df.mul(df1)
df
speed | weight | |
---|---|---|
Audi | 80 | 250 |
Jaguar | 90 | 200 |
BMW | 110 | 150 |
res
speed | weight | |
---|---|---|
Audi | 4000 | NaN |
Jaguar | 6750 | NaN |
BMW | 11000 | NaN |
Example 2: Understanding use of fill_value parameter
Here the missing values of the new dataframe is filled with the help of fill_value parameter.
df
speed | weight | |
---|---|---|
Audi | 80 | 250 |
Jaguar | 90 | 200 |
BMW | 110 | 150 |
df.mul(df1, fill_value=0)
speed | weight | |
---|---|---|
Audi | 4000 | 0.0 |
Jaguar | 6750 | 0.0 |
BMW | 11000 | 0.0 |
Example 3: Perform multiplication operation using “*”
We can perform multiplication operation using the asterisk symbol as well. We can see that the output generated through this method and through mul() operation is same.
df
speed | weight | |
---|---|---|
Audi | 80 | 250 |
Jaguar | 90 | 200 |
BMW | 110 | 150 |
df1
speed | |
---|---|
Audi | 50 |
Jaguar | 75 |
BMW | 100 |
df * df1
speed | weight | |
---|---|---|
Audi | 4000 | NaN |
Jaguar | 6750 | NaN |
BMW | 11000 | NaN |
Pandas Division : div()
The division function of pandas is used to perform division operation on dataframes.
Syntax
pandas.DataFrame.div(other, axis=’columns’, level=None, fill_value=None)
- other : scalar, sequence, Series, or DataFrame – This parameter consists any single or multiple element data structure, or list-like object.
- axis : {0 or ‘index’, 1 or ‘columns’} – This is used for deciding the axis on which the operation is applied.
- level : int or label – The level parameter is used for broadcasting across a level and matching Index values on the passed MultiIndex level.
- fill_value : float or None, default None – Whenever the dataframes have missing values, then to fill existing missing (NaN) values, we can use fill_value parameter.
Example 1: Using pandas div() function
To learn more about the div() function in pandas, we will look at this example where div() function is used to perform division operation over dataframes.
df
speed | weight | |
---|---|---|
Audi | 80 | 250 |
Jaguar | 90 | 200 |
BMW | 110 | 150 |
df.div(10)
speed | weight | |
---|---|---|
Audi | 8.0 | 25.0 |
Jaguar | 9.0 | 20.0 |
BMW | 11.0 | 15.0 |
Example 2: Using div() function on multindex dataframe
In this example, a multindex dataframe is created and then division function is passed this multindex dataframe.
df_multindex = pd.DataFrame({'no_gears': [4, 5, 3, 3, 8, 6],
'Speed': [360, 180, 360, 360, 540, 720]},
index=[['Sedan', 'Sedan', 'Sedan', 'Hatchback', 'Hatchback', 'Hatchback'],
['BMW', 'Audi', 'Bentley',
'Mercedes', 'Jaguar', 'Mini Cooper']])
df_multindex
no_gears | Speed | ||
---|---|---|---|
Sedan | BMW | 4 | 360 |
Audi | 5 | 180 | |
Bentley | 3 | 360 | |
Hatchback | Mercedes | 3 | 360 |
Jaguar | 8 | 540 | |
Mini Cooper | 6 | 720 |
df
speed | weight | |
---|---|---|
Audi | 80 | 250 |
Jaguar | 90 | 200 |
BMW | 110 | 150 |
df.div(df_multindex, level=1, fill_value=2)
Speed | no_gears | speed | weight | ||
---|---|---|---|---|---|
Sedan | BMW | 0.005556 | 0.500000 | 55.0 | 75.0 |
Audi | 0.011111 | 0.400000 | 40.0 | 125.0 | |
Bentley | 0.005556 | 0.666667 | NaN | NaN | |
Hatchback | Mercedes | 0.005556 | 0.666667 | NaN | NaN |
Jaguar | 0.003704 | 0.250000 | 45.0 | 100.0 | |
Mini Cooper | 0.002778 | 0.333333 | NaN | NaN |
Pandas Sum : sum()
The sum function helps in finding the sum of the values for desired axis.
Syntax
pandas.DataFrame.sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, kwargs)
- axis : {index (0), columns (1)} – This is the axis where the function is applied.
- skipna : bool, default True – It is used to decide whether the NA/Null values should be dropped/skipped or not while computation.
- level : int or level name, default None – It used for deciding the level, generally in case of multindex dataframes.
- numeric_only : bool,default None – It used to decide whether to include only float, int, boolean columns. If None, will attempt to use everything
- min_count : int,default 0 – The required number of valid values to perform the operation.
- kwargs : Additional Arguments.
Example 1: Using sum function with multindex dataframe
df_sum = pd.MultiIndex.from_arrays([
['Sedan', 'Hatchback', 'Sedan', 'Hatchback'],
['BMW', 'Mini Cooper', 'Audi', 'Aston Martin']],
names=['designs', 'companies'])
cars = pd.Series([3, 6, 9, 18], name='types_of_Cars', index=df_sum)
cars.sum()
36
Example 2: Using sum function with level parameter
Here since we have multindex dataframe, therefore we can perform sum function using level parameter. In this example, we can see how the levels are used in sum() function of pandas.
cars.sum(level='designs')
designs Sedan 12 Hatchback 24 Name: types_of_Cars, dtype: int64
cars.sum(level=0)
designs Sedan 12 Hatchback 24 Name: types_of_Cars, dtype: int64
cars.sum(level=1)
companies BMW 3 Mini Cooper 6 Audi 9 Aston Martin 18 Name: types_of_Cars, dtype: int64
Pandas Aggregate: agg()
The pandas aggregate function is used to aggregate using one or more operations over desired axis.
Syntax
pandas.dataframe.agg(func, axis=0, *args, kwargs)
- func : function, str, list or dict – This is the function used for aggregating the data.
- axis : {0 or ‘index’, 1 or ‘columns’}, default 0 – The axis over which the operation is applied.
- args : These are the positional arguments to pass to func.
- kwargs : Additional keyword arguments.
Example 1: Using pandas aggregate functions over rows
Here a dataframe is created first and then different operations are applied using aggregate function of pandas.
df = pd.DataFrame([[15, 22, 37],
[49, np.nan, 64],
[np.nan, 89, 99],
[53, np.nan,71]],
columns=['P', 'Q', 'R'])
df
P | Q | R | |
---|---|---|---|
0 | 15.0 | 22.0 | 37 |
1 | 49.0 | NaN | 64 |
2 | NaN | 89.0 | 99 |
3 | 53.0 | NaN | 71 |
Here sum and minimum value for each column is calculated using pandas agg() function.
df.agg(['sum', 'min'])
P | Q | R | |
---|---|---|---|
sum | 117.0 | 111.0 | 271 |
min | 15.0 | 22.0 | 37 |
Example 2: Using different agg() functions on each column
In this example, different types of functions are applied over different columns.
df
P | Q | R | |
---|---|---|---|
0 | 15.0 | 22.0 | 37 |
1 | 49.0 | NaN | 64 |
2 | NaN | 89.0 | 99 |
3 | 53.0 | NaN | 71 |
df.agg({'P' : ['sum', 'min'], 'Q' : ['min', 'max']})
P | Q | |
---|---|---|
max | NaN | 89.0 |
min | 15.0 | 22.0 |
sum | 117.0 | NaN |
Example 3: Aggregating over columns
Here the aggregate function is applied over columns. We can specify the operation and the axis on which it has to be performed.
df
P | Q | R | |
---|---|---|---|
0 | 15.0 | 22.0 | 37 |
1 | 49.0 | NaN | 64 |
2 | NaN | 89.0 | 99 |
3 | 53.0 | NaN | 71 |
df.agg("mean", axis="columns")
0 24.666667 1 56.500000 2 94.000000 3 62.000000 dtype: float64
Conclusion
Reaching to the end of this article, we learned about various mathematical operations like add(), sub(), mul(), div(), sum() and agg(). These basic mathematical operations can be performed easily with the help of pandas library. Since we deal with mathematical tasks in our data science interactions, these pandas operations will prove to very handy.
- Also Read – Tutorial – Pandas Drop, Pandas Dropna, Pandas Drop Duplicate
- Also Read – Pandas Visualization Tutorial – Bar Plot, Histogram, Scatter Plot, Pie Chart
- Also Read – Tutorial – Pandas Concat, Pandas Append, Pandas Merge, Pandas Join
- Also Read – Pandas DataFrame Tutorial – Selecting Rows by Value, Iterrows and DataReader
Reference – https://pandas.pydata.org/docs/
-
I am Palash Sharma, an undergraduate student who loves to explore and garner in-depth knowledge in the fields like Artificial Intelligence and Machine Learning. I am captivated by the wonders these fields have produced with their novel implementations. With this, I have a desire to share my knowledge with others in all my capacity.
View all posts