Pandas DataFrame Query() Method Explained with Example

Introduction

In this post, we will understand the Pandas query() method which is used to filter DataFrame with syntax similar to SQL queries. We will understand its syntax and its various functionalities with the help of examples.

Syntax of Pandas Query() | pandas.DataFrame.query

The syntax of Pandas Query() function is quite straightforward –

DataFrame.query(expr, inplace=False, **kwargs)

Parameters:

  • expr: This is the query expression, which is a string containing the filtering condition.
  • inplace: This is a boolean parameter that specifies whether to modify the DataFrame in place or return a new DataFrame with the filtered data. The default value is False.
  • **kwargs: Additional keyword arguments that can be used to specify variables for the query.

 

Examples of Pandas Query

Example 1: Basic Filtering using Pandas DataFrame Query()

In this example, we start by importing the Pandas library and creating a DataFrame df with columns ‘Name’, ‘Age’, and ‘Salary’. We then use the Pandas query () function to filter rows where the ‘Age’ is greater than 30, and the output is printed.

 

# Importing Pandas library
import pandas as pd

# Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 32, 28, 38],
        'Salary': [50000, 60000, 55000, 70000]}

df = pd.DataFrame(data)

#Print Input Data Frame
print("Input Data Frame:")
print(df)

# Filtering rows where the age is greater than 30
filtered_df = df.query('Age > 30')

#Print Output
print("\nOutput:")
print(filtered_df)

Output

Input Data Frame:
      Name  Age  Salary
0    Alice   25   50000
1      Bob   32   60000
2  Charlie   28   55000
3    David   38   70000

Output:
    Name  Age  Salary
1    Bob   32   60000
3  David   38   70000

Example 2: Using Variables in Queries

In this example, we introduce the concept of using variables within query expressions. We set a threshold_salary variable and use it within the query expression to filter rows where the ‘Salary’ column is greater than the specified threshold.

 

# Importing Pandas library
import pandas as pd

# Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 32, 28, 38],
        'Salary': [50000, 60000, 55000, 70000]}

df = pd.DataFrame(data)

#Print Input Data Frame
print("Input Data Frame:")
print(df)


# Specifying the threshold salary
threshold_salary = 60000

# Filtering rows where the salary is greater than the threshold
filtered_df = df.query('Salary > @threshold_salary')

#Print Output
print("\nOutput:")
print(filtered_df)

Output

Input Data Frame:
      Name  Age  Salary
0    Alice   25   50000
1      Bob   32   60000
2  Charlie   28   55000
3    David   38   70000

Output:
    Name  Age  Salary
3  David   38   70000

 

Example 3: Combining Multiple Conditions

In this example,  we filter rows based on multiple conditions using logical operators. We select rows where the ‘Age’ column is between 25 and 35 and the ‘Salary’ column is less than 65000.

 

# Importing Pandas library
import pandas as pd

# Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 32, 28, 38],
        'Salary': [50000, 60000, 55000, 70000]}

df = pd.DataFrame(data)

#Print Input Data Frame
print("Input Data Frame:")
print(df)

# Filtering rows based on multiple conditions
filtered_df = df.query('Age >= 25 and Age <= 35 and Salary < 65000')

#Print Output
print("\nOutput:")
print(filtered_df)

Output

Input Data Frame:
      Name  Age  Salary
0    Alice   25   50000
1      Bob   32   60000
2  Charlie   28   55000
3    David   38   70000

Output:
      Name  Age  Salary
0    Alice   25   50000
1      Bob   32   60000
2  Charlie   28   55000

 

Example 4: Filtering with String Values

In this example, we show how to perform string-based filtering. We use the  str.startswith() method within the query expression to filter rows where the ‘Name’ column starts with ‘C’, and we specify engine=’python’ to ensure Python’s string methods are used.

 

# Importing Pandas library
import pandas as pd

# Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 32, 28, 38],
        'Salary': [50000, 60000, 55000, 70000]}

df = pd.DataFrame(data)

#Print Input Data Frame
print("Input Data Frame:")
print(df)

# Filtering rows where 'Name' starts with 'C'
filtered_df = df.query('Name.str.startswith("C")', engine='python')

#Print Output
print("\nOutput:")
print(filtered_df)

Output

Input Data Frame:
      Name  Age  Salary
0    Alice   25   50000
1      Bob   32   60000
2  Charlie   28   55000
3    David   38   70000

Output:
      Name  Age  Salary
2  Charlie   28   55000

 

Example 5: In-Place Modification with Query

In the final example, we use the inplace=True parameter to modify the original DataFrame df by filtering rows that do not meet the specified condition. Specifically, we remove rows where the salary is less than 55000.

 

# Importing Pandas library
import pandas as pd

# Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 32, 28, 38],
        'Salary': [50000, 60000, 55000, 70000]}

df = pd.DataFrame(data)

#Print Input Data Frame
print("Input Data Frame:")
print(df)

# Removing rows where salary is less than 55000
df.query('Salary >= 55000', inplace=True)

#Print Output
print("\nOutput:")
print(df)

Output

Input Data Frame:
      Name  Age  Salary
0    Alice   25   50000
1      Bob   32   60000
2  Charlie   28   55000
3    David   38   70000

Output:
      Name  Age  Salary
1      Bob   32   60000
2  Charlie   28   55000
3    David   38   70000

 

Follow Us

Leave a Reply

Your email address will not be published. Required fields are marked *