Pandas Tutorial – isnull(), isin(), empty()

Pandas Tutorial - isnull(), isin(), empty()
Pandas Tutorial - isnull(), isin(), empty()

Introduction

While working with your machine learning or data science project, you will often have to explore the content of the pandas dataframes   In this tutorial, we will learn some useful pandas functions namely isnull(), isin(), and empty() that makes the life of data scientist easy. We will be looking at different examples along with the syntax for each function.

Importing Pandas Library

To start this tutorial, we will import the pandas library

In [1]:
import pandas as pd

This tutorial will be commenced with the isnull() function of pandas.

Pandas isnull: isnull()

The pandas isnull() function is used for detecting missing values in an array-like object.

Syntax

pandas.isnull(obj)

Ad
Deep Learning Specialization on Coursera

obj – This is the object which is passed to the function for finding missing values in it.

The result of this function is a boolean value. Based on the input provided, the boolean result is obtained.

Note – Pandas has an alias of isnull() function known as isna() which is usually used more and we are going to use this alias in our example.

Example 1: Applying isna() function over scalar values

In this example, the isna() function of pandas is applied to scalar values. When the function is provided a scalar value, then the result is false and if we specify a null value, then the output is true.

In [2]:
pd.isna('Orange')
Out[2]:
False
In [3]:
import numpy as np

pd.isna(np.nan)
Out[3]:
True
Example 2: Using pandas isna() on arrays

The pandas isna() can be applied to arrays and the result is also generated in the form of boolean arrays.

In [4]:
array = np.array([[np.nan, 7, 9], [ 8, np.nan,16]])
In [5]:
array
Out[5]:
array([[nan,  7.,  9.],
       [ 8., nan, 16.]])
In [6]:
pd.isna(array)
Out[6]:
array([[ True, False, False],
       [False,  True, False]])

Example 3: Usage of pandas isna() function on dataframe

The isna() function is highly useful for dataframes. In this example, we will look at it and understand the usage.

In [7]:
df = pd.DataFrame([['potato', None, 'spinach'], [None, 'Watermelon', 'Strawberry']])
In [8]:
df
Out[8]:
0 1 2
0 potato None spinach
1 None Watermelon Strawberry

So the values which were specified as None in the array, had boolean True and other values were False.

In [9]:
pd.isna(df)
Out[9]:
0 1 2
0 False True False
1 True False False

The next pandas function in this tutorial is isin().

Pandas isin : isin()

With the help of isin() function, we can find whether the element present in Dataframe is present in ‘values’ which provided as an argument to the function.

Syntax

pandas.DataFrame.isin(values)

values : iterable, Series, DataFrame or dict – Here the values which are required to be checked are provided in the form of either series, dataframe or dictionary.

The result is an array of boolean values.

Example 1: Using list as values

When we use list as a parameter for the pandas isin() function, we can check whether each value is present in the list or not.

In [10]:
df = pd.DataFrame({'seed_count': [50, 15], 'quantity': [15, 40]},
                   index=['watermelon', 'orange'])
In [11]:
df
Out[11]:
seed_count quantity
watermelon 50 15
orange 15 40

This isin() function tells us where we have 15 as a value in the dataframe.

In [12]:
df.isin([0, 15])
Out[12]:
seed_count quantity
watermelon False True
orange True False

Example 2: Using dictionary as values

By using dictionary as an input to the pandas function isin(), we can check each column’s value separately.

In [13]:
df.isin({'quantity': [0, 40]})
Out[13]:
seed_count quantity
watermelon False False
orange False True

Example 3: Using DataFrames as values

When we pass dataframes as values, then the new dataframe is checked if it contains the values in the main dataframe.

In [14]:
df_other = pd.DataFrame({'seed_count': [50, 5], 'quantity': [15, 2]},
                   index=['watermelon', 'orange'])
In [15]:
df_other
Out[15]:
seed_count quantity
watermelon 50 15
orange 5 2

As the values of the bottom row didn’t match, they were assigned False bool value.

In [16]:
df.isin(df_other)
Out[16]:
seed_count quantity
watermelon True True
orange False False

The third and final function in the list is empty() function.

Pandas empty : empty()

The pandas empty() function is useful in telling whether the DataFrame is empty or not.

Syntax

DataFrame.empty()

This function returns a bool value i.e. either True or False. If both the axis length is 0, then the value returned is true, otherwise it’s false.

Example 1: Simple example of empty function

In this example, a dataframe is created with no values entered in it. As expected the empty function results True, which means there is an empty dataframe.

In [17]:
df_emp = pd.DataFrame({'a' : []})
In [18]:
df_emp
Out[18]:
a
In [19]:
df_emp.empty
Out[19]:
True

Example 2: Using Nan values in array

When NaN values are provided as input to a DataFrame, then the DataFrame is not considered to be empty.

In [20]:
df_nan = pd.DataFrame({'a' : [np.nan]})
In [21]:
df_nan
Out[21]:
a
0 NaN

As we can see in the output, the false value suggests that the DataFrame is not empty.

In [22]:
df_nan.empty
Out[22]:
False

If we drop these NaN values, then we can see the output. It shows the value as true, thus suggesting that dataframe is empty.

In [23]:
df_nan.dropna().empty
Out[23]:
True

Conclusion

In this tutorial, we learn isnull(), isin() and empty() function of pandas that are used in the data explorations stage of a data science project.

Reference – https://pandas.pydata.org/docs/

Like and Comment section (Community Members)

Create Your ML Profile!

Don't miss out to join exclusive Machine Learning community

Comments

No comments yet