Introduction
In this post, we will understand the Pandas query() method which is used to filter DataFrame with syntax similar to SQL queries. We will understand its syntax and its various functionalities with the help of examples.
Syntax of Pandas Query() | pandas.DataFrame.query
The syntax of Pandas Query() function is quite straightforward –
DataFrame.query(expr, inplace=False, **kwargs)
Parameters:
- expr: This is the query expression, which is a string containing the filtering condition.
- inplace: This is a boolean parameter that specifies whether to modify the DataFrame in place or return a new DataFrame with the filtered data. The default value is False.
- **kwargs: Additional keyword arguments that can be used to specify variables for the query.
Examples of Pandas Query
Example 1: Basic Filtering using Pandas DataFrame Query()
In this example, we start by importing the Pandas library and creating a DataFrame df with columns ‘Name’, ‘Age’, and ‘Salary’. We then use the Pandas query () function to filter rows where the ‘Age’ is greater than 30, and the output is printed.
# Importing Pandas library import pandas as pd # Creating a DataFrame data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 32, 28, 38], 'Salary': [50000, 60000, 55000, 70000]} df = pd.DataFrame(data) #Print Input Data Frame print("Input Data Frame:") print(df) # Filtering rows where the age is greater than 30 filtered_df = df.query('Age > 30') #Print Output print("\nOutput:") print(filtered_df)
Output
Input Data Frame: Name Age Salary 0 Alice 25 50000 1 Bob 32 60000 2 Charlie 28 55000 3 David 38 70000 Output: Name Age Salary 1 Bob 32 60000 3 David 38 70000
Example 2: Using Variables in Queries
In this example, we introduce the concept of using variables within query expressions. We set a threshold_salary
variable and use it within the query expression to filter rows where the ‘Salary’ column is greater than the specified threshold.
# Importing Pandas library import pandas as pd # Creating a DataFrame data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 32, 28, 38], 'Salary': [50000, 60000, 55000, 70000]} df = pd.DataFrame(data) #Print Input Data Frame print("Input Data Frame:") print(df) # Specifying the threshold salary threshold_salary = 60000 # Filtering rows where the salary is greater than the threshold filtered_df = df.query('Salary > @threshold_salary') #Print Output print("\nOutput:") print(filtered_df)
Output
Input Data Frame: Name Age Salary 0 Alice 25 50000 1 Bob 32 60000 2 Charlie 28 55000 3 David 38 70000 Output: Name Age Salary 3 David 38 70000
Example 3: Combining Multiple Conditions
In this example, we filter rows based on multiple conditions using logical operators. We select rows where the ‘Age’ column is between 25 and 35 and the ‘Salary’ column is less than 65000.
# Importing Pandas library import pandas as pd # Creating a DataFrame data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 32, 28, 38], 'Salary': [50000, 60000, 55000, 70000]} df = pd.DataFrame(data) #Print Input Data Frame print("Input Data Frame:") print(df) # Filtering rows based on multiple conditions filtered_df = df.query('Age >= 25 and Age <= 35 and Salary < 65000') #Print Output print("\nOutput:") print(filtered_df)
Output
Input Data Frame: Name Age Salary 0 Alice 25 50000 1 Bob 32 60000 2 Charlie 28 55000 3 David 38 70000 Output: Name Age Salary 0 Alice 25 50000 1 Bob 32 60000 2 Charlie 28 55000
Example 4: Filtering with String Values
In this example, we show how to perform string-based filtering. We use the str.startswith() method within the query expression to filter rows where the ‘Name’ column starts with ‘C’, and we specify engine=’python’ to ensure Python’s string methods are used.
# Importing Pandas library import pandas as pd # Creating a DataFrame data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 32, 28, 38], 'Salary': [50000, 60000, 55000, 70000]} df = pd.DataFrame(data) #Print Input Data Frame print("Input Data Frame:") print(df) # Filtering rows where 'Name' starts with 'C' filtered_df = df.query('Name.str.startswith("C")', engine='python') #Print Output print("\nOutput:") print(filtered_df)
Output
Input Data Frame: Name Age Salary 0 Alice 25 50000 1 Bob 32 60000 2 Charlie 28 55000 3 David 38 70000 Output: Name Age Salary 2 Charlie 28 55000
Example 5: In-Place Modification with Query
In the final example, we use the inplace=True parameter to modify the original DataFrame df by filtering rows that do not meet the specified condition. Specifically, we remove rows where the salary is less than 55000.
# Importing Pandas library import pandas as pd # Creating a DataFrame data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 32, 28, 38], 'Salary': [50000, 60000, 55000, 70000]} df = pd.DataFrame(data) #Print Input Data Frame print("Input Data Frame:") print(df) # Removing rows where salary is less than 55000 df.query('Salary >= 55000', inplace=True) #Print Output print("\nOutput:") print(df)
Output
Input Data Frame: Name Age Salary 0 Alice 25 50000 1 Bob 32 60000 2 Charlie 28 55000 3 David 38 70000 Output: Name Age Salary 1 Bob 32 60000 2 Charlie 28 55000 3 David 38 70000
- Reference: Pandas Documentation