Contents
Introduction
While working with your machine learning or data science project, you will often have to explore the content of the pandas dataframes In this tutorial, we will learn some useful pandas functions namely isnull(), isin(), and empty() that makes the life of data scientist easy. We will be looking at different examples along with the syntax for each function.
Importing Pandas Library
To start this tutorial, we will import the pandas library
import pandas as pd
This tutorial will be commenced with the isnull() function of pandas.
Pandas isnull: isnull()
The pandas isnull() function is used for detecting missing values in an array-like object.
Syntax
pandas.isnull(obj)
obj – This is the object which is passed to the function for finding missing values in it.
The result of this function is a boolean value. Based on the input provided, the boolean result is obtained.
Note – Pandas has an alias of isnull() function known as isna() which is usually used more and we are going to use this alias in our example.
Example 1: Applying isna() function over scalar values
In this example, the isna() function of pandas is applied to scalar values. When the function is provided a scalar value, then the result is false and if we specify a null value, then the output is true.
pd.isna('Orange')
import numpy as np
pd.isna(np.nan)
The pandas isna() can be applied to arrays and the result is also generated in the form of boolean arrays.
array = np.array([[np.nan, 7, 9], [ 8, np.nan,16]])
array
pd.isna(array)
Example 3: Usage of pandas isna() function on dataframe
The isna() function is highly useful for dataframes. In this example, we will look at it and understand the usage.
df = pd.DataFrame([['potato', None, 'spinach'], [None, 'Watermelon', 'Strawberry']])
df
So the values which were specified as None in the array, had boolean True and other values were False.
pd.isna(df)
The next pandas function in this tutorial is isin().
Pandas isin : isin()
With the help of isin() function, we can find whether the element present in Dataframe is present in ‘values’ which provided as an argument to the function.
Syntax
pandas.DataFrame.isin(values)
values : iterable, Series, DataFrame or dict – Here the values which are required to be checked are provided in the form of either series, dataframe or dictionary.
The result is an array of boolean values.
Example 1: Using list as values
When we use list as a parameter for the pandas isin() function, we can check whether each value is present in the list or not.
df = pd.DataFrame({'seed_count': [50, 15], 'quantity': [15, 40]},
index=['watermelon', 'orange'])
df
This isin() function tells us where we have 15 as a value in the dataframe.
df.isin([0, 15])
Example 2: Using dictionary as values
By using dictionary as an input to the pandas function isin(), we can check each column’s value separately.
df.isin({'quantity': [0, 40]})
Example 3: Using DataFrames as values
When we pass dataframes as values, then the new dataframe is checked if it contains the values in the main dataframe.
df_other = pd.DataFrame({'seed_count': [50, 5], 'quantity': [15, 2]},
index=['watermelon', 'orange'])
df_other
As the values of the bottom row didn’t match, they were assigned False bool value.
df.isin(df_other)
The third and final function in the list is empty() function.
Pandas empty : empty()
The pandas empty() function is useful in telling whether the DataFrame is empty or not.
Syntax
DataFrame.empty()
This function returns a bool value i.e. either True or False. If both the axis length is 0, then the value returned is true, otherwise it’s false.
Example 1: Simple example of empty function
In this example, a dataframe is created with no values entered in it. As expected the empty function results True, which means there is an empty dataframe.
df_emp = pd.DataFrame({'a' : []})
df_emp
df_emp.empty
Example 2: Using Nan values in array
When NaN values are provided as input to a DataFrame, then the DataFrame is not considered to be empty.
df_nan = pd.DataFrame({'a' : [np.nan]})
df_nan
As we can see in the output, the false value suggests that the DataFrame is not empty.
df_nan.empty
If we drop these NaN values, then we can see the output. It shows the value as true, thus suggesting that dataframe is empty.
df_nan.dropna().empty
Conclusion
In this tutorial, we learn isnull(), isin() and empty() function of pandas that are used in the data explorations stage of a data science project.
- Also Read – Tutorial – Pandas Drop, Pandas Dropna, Pandas Drop Duplicate
- Also Read – Pandas Visualization Tutorial – Bar Plot, Histogram, Scatter Plot, Pie Chart
- Also Read – Tutorial – Pandas Concat, Pandas Append, Pandas Merge, Pandas Join
Reference – https://pandas.pydata.org/docs/