Tutorial – Pandas Copy, Pandas Cut and Pandas Query

Pandas Copy, Pandas Cut and Pandas Query

Introduction

We have seen in earlier tutorials how useful Pandas dataFrames are in Data Science or machine learning projects. In this tutorial, we will be learning about some new pandas operations – copy(), cut() and query(). The tutorial will look into the syntax of each function and also the examples which are used in real-world scenarios.

Importing Pandas Library

Starting the tutorial by importing the Pandas library.

In [1]:
import pandas as pd

import numpy as np

Pandas Copy : Copy()

The pandas copy() function is used for creating a copy of the object’s indices and data.

Syntax

DataFrame.copy(deep=True)

deep : bool : After passing the object to the function, we have to decide whether a deep copy of the specified object should be created or not. The default value of deep parameter is True.

If set as True, then a new object will be created with a copy of the calling object’s data and indices. Modifications to the data or indices of the copy will not be reflected in the original object

If specified as False, then a new object will be created without copying the calling object’s data or index. Any changes to the original object will be reflected in the copy as well.

This function returns the copy of the passed object.

Example 1: Simple example of Pandas Copy Function

Using copy() function we can generate a copy of the series object.

In [2]:
s = pd.Series([7, 9], index=["p", "q"])
In [3]:
s_copy = s.copy()
In [4]:
s_copy
Out[4]:
p    7
q    9
dtype: int64

Example 2: Showing difference in Pandas Shallow and Deep copy

In this example, we will look at the difference between shallow and deep copy created using the copy() function of pandas.

In [5]:
s = pd.Series([7, 9], index=["p", "q"])

For creating deep copy, we have to use copy() function whereas for creating a shallow copy, we pass the deep parameter value of False.

In [6]:
deep = s.copy()
In [7]:
deep
Out[7]:
p    7
q    9
dtype: int64
In [8]:
shallow = s.copy(deep=False)
In [9]:
shallow
Out[9]:
p    7
q    9
dtype: int64

Checking whether the series object is shallow or deep.

In [10]:
s is shallow
Out[10]:
False

Since the values and indices of the original series is copied in shallow copy, thus we get True as the output.

In [11]:
s.values is shallow.values and s.index is shallow.index
Out[11]:
True

Here we know that the original object is not deep copy and than the values and indices are also not copied in the original object in case of a deep copy.

In [12]:
s is deep
Out[12]:
False
In [13]:
s.values is deep.values or s.index is deep.index
Out[13]:
False

Example 3: Main difference in Pandas Shallow and Deep Copy

Since in the shallow copy, the changes made in the original object are reflected, we can see those changes. Whereas in case of deep copy, the changes made in the original copy are not shown. So this is the main difference between shallow and deep copy.

In [14]:
s[0] = 3
In [15]:
shallow[1] = 4
In [16]:
s
Out[16]:
p    3
q    4
dtype: int64
In [17]:
shallow
Out[17]:
p    3
q    4
dtype: int64
In [18]:
deep
Out[18]:
p    7
q    9
dtype: int64
[adrotate banner=”3″]

Pandas Cut : Cut()

Pandas cut() function is used for creating bins with the help of discrete intervals. The cut() function can be used when we are looking to segment and sort the data values into bins.

Syntax

pandas.cut(x, bins, right = True, labels=None, retbins = False, precision = 3, include_lowest = False, duplicates = ‘raise’)

x : array-like – This takes the array that has to be binned

bins : int,sequence of scalars – Here the desired kind of bins are

right : bool – It tells whether the rightmost edge is included or not

Labels : array or False – Using this parameter we can specify the labels for the bins returned.

retbins : bool – This parameter is used to tell the function whether the bins have to be retunrned or not.

precision : int – The precision at which to store and display the bins labels.

include_lowest : bool – This decides whether the first interval should be left-inclusive or not

duplicates : {default ‘raise’, ‘drop’}, optional – It checks that if bin edges are not unique, raise ValueError or drop non-uniques

The function returns an array-like object and bins which were desired or specified.

Example 1: Simple example of Pandas Cut Function

Segmenting the values into three equal-sized bins. Here the complete array is divided into three bins of equal size and then the resulting array is displayed as output.

In [19]:
pd.cut(np.array([2, 8, 3, 9, 6, 7]), 3)
Out[19]:
[(1.993, 4.333], (6.667, 9.0], (1.993, 4.333], (6.667, 9.0], (4.333, 6.667], (6.667, 9.0]]
Categories (3, interval[float64]): [(1.993, 4.333] < (4.333, 6.667] < (6.667, 9.0]]

Example 2: Using series as an input

In [20]:
s = pd.Series(np.array([1, 3, 5, 7, 9]),index=['p', 'q', 'r', 's', 't'])
In [21]:
pd.cut(s, 3)
Out[21]:
p    (0.992, 3.667]
q    (0.992, 3.667]
r    (3.667, 6.333]
s      (6.333, 9.0]
t      (6.333, 9.0]
dtype: category
Categories (3, interval[float64]): [(0.992, 3.667] < (3.667, 6.333] < (6.333, 9.0]]

Pandas Query : Query()

The pandas query() function is used to query the columns of a dataframe with the help of boolean expression.

Syntax

DataFrame.query(expr,inplace=False,kwargs)**

expr : str – It contains the query string to evaluate

inplace : bool – It decides whether the query should modify the data in place or return a modified copy.

kwargs – For additional arguments.

Example 1: Simple example of Pandas Query Function

Here a dataframe is created using range() function.

In [22]:
df = pd.DataFrame({'A': range(2, 7),
                   'B': range(20, 0, -4),
                  'C': range(20, 10, -2)})
In [23]:
df
Out[23]:
A B C
0 2 20 20
1 3 16 18
2 4 12 16
3 5 8 14
4 6 4 12

As we can see the 4th index row has a value which is greater in column ‘A’ than column ‘B’ and thus we get the output.

In [24]:
df.query('A > B')
Out[24]:
A B C
4 6 4 12

Example 2: Checking equal condition

Clearly the first or 0th index row satisfies the condition and we get the output.

In [25]:
df.query('B == C')
Out[25]:
A B C
0 2 20 20

Conclusion

We have reached the end of this article, through this article we learned about some new pandas functions, namely pandas copy(), cut() and query(). These functions are helpful in applying operations over a Pandas DataFrame. We also looked at the syntax of these functions and their examples which helps in understanding the usage of functions.

Reference – https://pandas.pydata.org/docs/

Follow Us

Leave a Reply

Your email address will not be published. Required fields are marked *