Introduction
We have seen in earlier tutorials how useful Pandas dataFrames are in Data Science or machine learning projects. In this tutorial, we will be learning about some new pandas operations – copy(), cut() and query(). The tutorial will look into the syntax of each function and also the examples which are used in real-world scenarios.
Importing Pandas Library
Starting the tutorial by importing the Pandas library.
import pandas as pd
import numpy as np
Pandas Copy : Copy()
The pandas copy() function is used for creating a copy of the object’s indices and data.
Syntax
DataFrame.copy(deep=True)
deep : bool : After passing the object to the function, we have to decide whether a deep copy of the specified object should be created or not. The default value of deep parameter is True.
If set as True, then a new object will be created with a copy of the calling object’s data and indices. Modifications to the data or indices of the copy will not be reflected in the original object
If specified as False, then a new object will be created without copying the calling object’s data or index. Any changes to the original object will be reflected in the copy as well.
This function returns the copy of the passed object.
Example 1: Simple example of Pandas Copy Function
Using copy() function we can generate a copy of the series object.
s = pd.Series([7, 9], index=["p", "q"])
s_copy = s.copy()
s_copy
p 7 q 9 dtype: int64
Example 2: Showing difference in Pandas Shallow and Deep copy
In this example, we will look at the difference between shallow and deep copy created using the copy() function of pandas.
s = pd.Series([7, 9], index=["p", "q"])
For creating deep copy, we have to use copy() function whereas for creating a shallow copy, we pass the deep parameter value of False.
deep = s.copy()
deep
p 7 q 9 dtype: int64
shallow = s.copy(deep=False)
shallow
p 7 q 9 dtype: int64
Checking whether the series object is shallow or deep.
s is shallow
False
Since the values and indices of the original series is copied in shallow copy, thus we get True as the output.
s.values is shallow.values and s.index is shallow.index
True
Here we know that the original object is not deep copy and than the values and indices are also not copied in the original object in case of a deep copy.
s is deep
False
s.values is deep.values or s.index is deep.index
False
Example 3: Main difference in Pandas Shallow and Deep Copy
Since in the shallow copy, the changes made in the original object are reflected, we can see those changes. Whereas in case of deep copy, the changes made in the original copy are not shown. So this is the main difference between shallow and deep copy.
s[0] = 3
shallow[1] = 4
s
p 3 q 4 dtype: int64
shallow
p 3 q 4 dtype: int64
deep
p 7 q 9 dtype: int64
Pandas Cut : Cut()
Pandas cut() function is used for creating bins with the help of discrete intervals. The cut() function can be used when we are looking to segment and sort the data values into bins.
Syntax
pandas.cut(x, bins, right = True, labels=None, retbins = False, precision = 3, include_lowest = False, duplicates = ‘raise’)
x : array-like – This takes the array that has to be binned
bins : int,sequence of scalars – Here the desired kind of bins are
right : bool – It tells whether the rightmost edge is included or not
Labels : array or False – Using this parameter we can specify the labels for the bins returned.
retbins : bool – This parameter is used to tell the function whether the bins have to be retunrned or not.
precision : int – The precision at which to store and display the bins labels.
include_lowest : bool – This decides whether the first interval should be left-inclusive or not
duplicates : {default ‘raise’, ‘drop’}, optional – It checks that if bin edges are not unique, raise ValueError or drop non-uniques
The function returns an array-like object and bins which were desired or specified.
Example 1: Simple example of Pandas Cut Function
Segmenting the values into three equal-sized bins. Here the complete array is divided into three bins of equal size and then the resulting array is displayed as output.
pd.cut(np.array([2, 8, 3, 9, 6, 7]), 3)
[(1.993, 4.333], (6.667, 9.0], (1.993, 4.333], (6.667, 9.0], (4.333, 6.667], (6.667, 9.0]] Categories (3, interval[float64]): [(1.993, 4.333] < (4.333, 6.667] < (6.667, 9.0]]
Example 2: Using series as an input
s = pd.Series(np.array([1, 3, 5, 7, 9]),index=['p', 'q', 'r', 's', 't'])
pd.cut(s, 3)
p (0.992, 3.667] q (0.992, 3.667] r (3.667, 6.333] s (6.333, 9.0] t (6.333, 9.0] dtype: category Categories (3, interval[float64]): [(0.992, 3.667] < (3.667, 6.333] < (6.333, 9.0]]
Pandas Query : Query()
The pandas query() function is used to query the columns of a dataframe with the help of boolean expression.
Syntax
DataFrame.query(expr,inplace=False,kwargs)**
expr : str – It contains the query string to evaluate
inplace : bool – It decides whether the query should modify the data in place or return a modified copy.
kwargs – For additional arguments.
Example 1: Simple example of Pandas Query Function
Here a dataframe is created using range() function.
df = pd.DataFrame({'A': range(2, 7),
'B': range(20, 0, -4),
'C': range(20, 10, -2)})
df
A | B | C | |
---|---|---|---|
0 | 2 | 20 | 20 |
1 | 3 | 16 | 18 |
2 | 4 | 12 | 16 |
3 | 5 | 8 | 14 |
4 | 6 | 4 | 12 |
As we can see the 4th index row has a value which is greater in column ‘A’ than column ‘B’ and thus we get the output.
df.query('A > B')
A | B | C | |
---|---|---|---|
4 | 6 | 4 | 12 |
Example 2: Checking equal condition
Clearly the first or 0th index row satisfies the condition and we get the output.
df.query('B == C')
A | B | C | |
---|---|---|---|
0 | 2 | 20 | 20 |
Conclusion
We have reached the end of this article, through this article we learned about some new pandas functions, namely pandas copy(), cut() and query(). These functions are helpful in applying operations over a Pandas DataFrame. We also looked at the syntax of these functions and their examples which helps in understanding the usage of functions.
- Also Read – Tutorial – Pandas Drop, Pandas Dropna, Pandas Drop Duplicate
- Also Read – Pandas Visualization Tutorial – Bar Plot, Histogram, Scatter Plot, Pie Chart
- Also Read – Tutorial – Pandas Concat, Pandas Append, Pandas Merge, Pandas Join
- Also Read – Pandas DataFrame Tutorial – Selecting Rows by Value, Iterrows and DataReader
Reference – https://pandas.pydata.org/docs/