Introduction
In this article, we will continue with the pandas tutorial and cover rolling(), corr(), and apply() functions. We will look at the syntax and examples of these functions for better understanding.
Importing Pandas Library
Starting the tutorial by importing the Pandas library.
import pandas as pd
import numpy as np
Pandas Rolling : Rolling()
The pandas rolling function helps in calculating rolling window calculations.
Syntax
DataFrame.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None)
window : int or offset – This parameter determines the size of the moving window. This is the number of observations used for calculating the statistic. Each window will be a fixed size.
If its set to offset then this will be the time period of each window. Each window will be a variable-sized based on the observations included in the time-period. This is only valid for datetime like indexes.
min_periods : int,default is None – This parameter is used to specify the minimum number of observations in window required to have a value (otherwise result is NA).
center : bool – It is helpful in selecting the label at the center of the window
win_type : str – This is used for providing window type.
on : str(optional) – For a DataFrame, a datetime-like column or MultiIndex level on which to calculate the rolling window, rather than the DataFrame’s index.
axis : int or str
closed : str, default None – This is used for making the interval closed on the ‘right’, ‘left’, ‘both’ or ‘neither’ endpoints.
The function returns a window or rolling for a particular operation.
Example 1: Using win_type parameter in Pandas Rolling()
Here in this first example of rolling function, we are using the different values of win_type parameter. Using the win_type parameter, we can perform the sum operation.
df = pd.DataFrame({'A': [7, 3, 5,9, 2]})
df
A | |
---|---|
0 | 7 |
1 | 3 |
2 | 5 |
3 | 9 |
4 | 2 |
In the example given below, the sum of numbers is calculated using rolling function and win_type parameter.
df.rolling(2, win_type='triang').sum()
A | |
---|---|
0 | NaN |
1 | 5.0 |
2 | 4.0 |
3 | 7.0 |
4 | 5.5 |
Similarly, win_type parameter is passed “gaussian” value. Here while using gaussian parameter, we have to specify standard deviation as well.
df.rolling(2, win_type='gaussian').sum(std=3)
A | |
---|---|
0 | NaN |
1 | 9.862071 |
2 | 7.889657 |
3 | 13.806900 |
4 | 10.848278 |
Example 2: Using min_periods parameter in Pandas Rolling()
This parameter is used to specify the minimum number of observations in window required to have a value.
df.rolling(2, min_periods=1).sum()
A | |
---|---|
0 | 7.0 |
1 | 10.0 |
2 | 8.0 |
3 | 14.0 |
4 | 11.0 |
As we can see that in this case the minimum periods value is set to ‘2’ and thus the 0th index row has value as NaN
df.rolling(2, min_periods=2).sum()
A | |
---|---|
0 | NaN |
1 | 10.0 |
2 | 8.0 |
3 | 14.0 |
4 | 11.0 |
[adrotate banner=”3″]
Pandas Correlation : Corr()
The pandas corr() function is beneficial in computing the correlation of columns.
Syntax
DataFrame.corr(method=’pearson’, min_periods=1)
method : {‘pearson’, ‘kendall’, ‘spearman’} – Using these 3 different correlation methods, we can check the correlation in the columns.
min_periods : int,optional – This optional parameter decides the minimum number of observations required per pair of columns to have a valid result.
The output of the function is a DataFrame with correlation matrix.
Example 1: Simple example of corr() function
We will create a dataframe using “csv” file. Now corr() function has a method called “pearson”. Using this method, we will get the correlation value for all the columns which contains numerical values.
df = pd.read_csv('employees.csv')
df.head()
First Name | Gender | Start Date | Last Login Time | Salary | Bonus % | Senior Management | Team | |
---|---|---|---|---|---|---|---|---|
0 | Douglas | Male | 8/6/1993 | 12:42 PM | 97308 | 6.945 | True | Marketing |
1 | Thomas | Male | 3/31/1996 | 6:53 AM | 61933 | 4.170 | True | NaN |
2 | Maria | Female | 4/23/1993 | 11:17 AM | 130590 | 11.858 | False | Finance |
3 | Jerry | Male | 3/4/2005 | 1:00 PM | 138705 | 9.340 | True | Finance |
4 | Larry | Male | 1/24/1998 | 4:47 PM | 101004 | 1.389 | True | Client Services |
It uses standard correlation coefficient for calculating the correlation value.
df.corr(method='pearson')
Salary | Bonus % | |
---|---|---|
Salary | 1.000000 | -0.036381 |
Bonus % | -0.036381 | 1.000000 |
Example 2: Finding correlation value using Kendall method
In this example, the Kendall method is used. It uses Kendall Tau correlation coefficient for calculating the correlation value.
df.corr(method='kendall')
Salary | Bonus % | |
---|---|---|
Salary | 1.0000 | -0.0234 |
Bonus % | -0.0234 | 1.0000 |
Pandas Apply : Apply()
The pandas apply() function is used for applying a function along the axis of a dataframe.
Syntax
DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), kwds)
func : function – Here the function which has to be applied is passed.
axis : {0 or ‘index’, 1 or ‘columns’}, default 0 – The axis along which the function is applied.
raw : bool, default False – It helps to determine if row or column is passed as a Series or ndarray object.
result_type : {‘expand’, ‘reduce’, ‘broadcast’, None}, default None – This is used for specifying the type of result expected.
args : tuple – This contains the positional arguments that are passed to the function in addition to the array/series.
kwds – Additional Keyword arguments.
Example 1: Applying pandas square root function
Here using the apply function, we are applying the square root function of numpy.
df = pd.DataFrame([[64,81]] * 3, columns=['P', 'Q'])
df
P | Q | |
---|---|---|
0 | 64 | 81 |
1 | 64 | 81 |
2 | 64 | 81 |
df.apply(np.sqrt)
P | Q | |
---|---|---|
0 | 8.0 | 9.0 |
1 | 8.0 | 9.0 |
2 | 8.0 | 9.0 |
Example 2: Applying reducing function i.e. sum function over either axis
In this example the pandas apply function is applying numpy’s sum function over both the axis. We will see the difference in the results obtained through two examples.
In the below example, the rows of column “P” and “Q” added to produce the results.
df.apply(np.sum, axis=0)
P 192 Q 243 dtype: int64
In this example the column values are added to produce the results. This is the reason why we are getting the same values for all the 3 rows
df.apply(np.sum, axis=1)
0 145 1 145 2 145 dtype: int64
Example 3: Understanding the usage of result_type parameter.
For this example, we will use lambda function and then will see how result_type function is used. The results for each row are generated in the form of a list-like array.
df.apply(lambda x: [7, 9], axis=1)
0 [7, 9] 1 [7, 9] 2 [7, 9] dtype: object
Now when we specify the result_type value as “expand”, then we get the results in the form of dataframe. So in simple terms, the results are expanded.
df.apply(lambda x: [7, 9], axis=1, result_type='expand')
0 | 1 | |
---|---|---|
0 | 7 | 9 |
1 | 7 | 9 |
2 | 7 | 9 |
Conclusion
We have reached the end of this article, through this article we learned about some new pandas functions, namely pandas rolling(), correlation() and apply(). These functions are helpful in applying operations over a Pandas DataFrame. We also looked at the syntax of these functions and their examples which helps in understanding the usage of functions.
- Also Read – Tutorial – Pandas Drop, Pandas Dropna, Pandas Drop Duplicate
- Also Read – Pandas Visualization Tutorial – Bar Plot, Histogram, Scatter Plot, Pie Chart
- Also Read – Tutorial – Pandas Concat, Pandas Append, Pandas Merge, Pandas Join
- Also Read – Pandas DataFrame Tutorial – Selecting Rows by Value, Iterrows and DataReader
Reference – https://pandas.pydata.org/docs/