As an initial step, in machine learning or data science projects, we carry out data exploration to understand our data. If we are handling the data with the help of pandas library, we have the advantage of exploring our data easily by using pandas functions such as describe(), head(), unique() and count(). In this article, we will look at these functions and learn how they can be used for data exploration with some examples.

Importing Pandas Library

We will be starting this tutorial by importing pandas library.

import pandas as pd
import numpy as np

Starting this article with pandas describe function.

Pandas Describe : describe()

The describe() function is used for generating descriptive statistics of a dataset.

This pandas function provides the dataset’s information about central tendency, data dispersion, and shape of a dataset.

Syntax

pandas.DataFrame.describe(self,percentiles,include,exclude)

self : DataFrame or Series – This is the dataframe or series which is passed to describe() function for finding its descriptive statistics.

percentiles : list-like of numbers – Here we provide the desired percentiles which should be included in the output. The default values are 0.25,0.5 and 0.75 i.e. 25th percentile, 50th percentile and 75th percentile. All the values should be between 0 and 1.

include : list-like of dtypes or None(optional) – This is the acceptable list of data types that can be included in the output.

exclude : list-like of dtypes or None(optional) – This is the list of data types which should not be included in the output.**

As an output, we get summarized statistics of series or dataframe.

Example 1: describing a series

Here we will apply describe() function over a series.

s = pd.Series([7, 9, 11])

s

0     7
1     9
2    11
dtype: int64

As we can see, we have obtained different descriptive statistics parameter such as count, mean, std i.e. standard deviation and many more.

s.describe()

count     3.0
mean      9.0
std       2.0
min       7.0
25%       8.0
50%       9.0
75%      10.0
max      11.0
dtype: float64

Pandas describe() function can be used over categorical data as well.

s = pd.Series(['P', 'P', 'Q', 'R'])

s

0    P
1    P
2    Q
3    R
dtype: object

The pandas describe() can help in describing categorical data i.e. text data.

s.describe()

count     4
unique    3
top       P
freq      2
dtype: object

Example 3: Describing dataframe

As we mostly deal with dataframes, let’s see how they are described using pandas describe() function.

df = pd.DataFrame({'categorical': pd.Categorical(['A','B','C']),
                   'numeric': [3, 6, 9],
                   'object': ['P', 'Q', 'R']
                   })

df

In this example, the numeric data is described.

df.describe()

By using include parameter, we can get the descriptive statistics for each data type present in dataframe.

df.describe(include='all')

The next function in the list is pandas head function

Pandas head : head()

The head() returns the first n rows of an object. It helps in knowing the data and datatype of the object.

Syntax

pandas.DataFrame.head(n=5)

n : int(default = 5) – This provides information about the number of rows which will be returned.

The head function returns the object with the desired number of rows.

Example 1: Simple example of head() function

In this example, we will look at how head function returns a sample of dataframe with ‘n’ number of rows.

stud = pd.DataFrame({'Students': ['Jack', 'Dale', 'Shaun', 'Shane',
                    'Brett', 'Patrick', 'Mitchell', 'David', 'Zoe']})

stud

stud.head()

Example 2: providing value of ‘n’

As we know, we can provide the value of ‘n’. So in this example, we will be providing value of ‘n’.

Since we provided the value of ‘n’ as ‘3’, we get three rows in the output.

stud.head(3)

Example 3: using tail function

For accessing the dataframe’s ending values, we will use tail() function. By default, we will get the last 5 values of dataframe.

stud.tail()

The third function in the list is pandas unique function.

[adrotate banner=”3″]

Pandas unique : unique()

The unique() function returns unique values present in series object. The values are returned in the order of appearance.

Syntax

series.unqiue()

Here the unique function is applied over series object and then the unique values are returned.

The output of this function is an array.

Example 1: using pandas unique() over series object

In the below-given example, we will be applying unique() function on the series object.

In the output, we get an array with unique values.

pd.Series([7, 14, 9, 9], name='Test').unique()

array([ 7, 14,  9], dtype=int64)

Example 2: unique function on categorical data

As mentioned earlier, categorical data is text data. So let’s see how the unique function operates over a series containing categorical data.

In this first categorical data, we can see that the list is divided into different categories.

pd.Series(pd.Categorical(list('gpprs'))).unique()

[g, p, r, s]
Categories (4, object): [g, p, r, s]

In this example, the same categorical data is displayed in ordered form. This is because we have specified ordered keyword.

pd.Series(pd.Categorical(list('gpprs'), categories=list('gprs'),
                        ordered=True)).unique()

[g, p, r, s]
Categories (4, object): [g < p < r < s]

The last function in this article which we’ll look at is pandas count.

Pandas Count : count()

The pandas count() function helps in counting non-NA cells of each column or row.

Syntax

pandas.DataFrame.count(axis=0,level=None,numeric_only=False)

axis : {0 or ‘index’, 1 or ‘columns’}, default 0 – If the value provided is 0, then counts are generated for each column. If value provided is 1, then counts are generated for rows.

level : int or str(optional) – It is used to specify the level along which counting should be done. Generally used for hierarchical i.e. multi-index dataframes.

numeric_only : bool – For specifying which kind of data, i.e. either float, int or boolean data.

The output is a Series or DataFrame. For each column/row, the non-NA entries are counted.

Example 1: counting non-NA values

Here a dataframe is created with the help of a dictionary.

df = pd.DataFrame({"Employee":
                    ["Rakesh", "Ramesh", "Suresh", "Jayesh", "Bhavesh"],
                    "Age": [27, 36, 30, np.nan, 23],
                   "Married_Status": [False, True, False, True, False]})

df

The below output shows the results of count() function.

df.count()

Employee          5
Age               4
Married_Status    5
dtype: int64

In this count() function example, we have applied count function over axis of columns. This is the reason for 3rd index the count is 2 as compared to other columns where 3 values are present.

df.count(axis='columns')

0    3
1    3
2    3
3    2
4    3
dtype: int64

Now it’s time to end this article, in this tutorial we covered four different pandas functions which are beneficial to use when we want to understand and explore our data for data preprocessing operations and for taking crucial decisions using this data. The functions which we covered are describe(),head(),unique() and count(). These are some useful pandas functions applied over dataframes for understanding our data stored in it.

	categorical	numeric	object
count	3	3.0	3
unique	3	NaN	3
top	C	NaN	R
freq	1	NaN	1
mean	NaN	6.0	NaN
std	NaN	3.0	NaN
min	NaN	3.0	NaN
25%	NaN	4.5	NaN
50%	NaN	6.0	NaN
75%	NaN	7.5	NaN
max	NaN	9.0	NaN

Pandas Tutorial – describe(), head(), unique() and count()

Introduction

Importing Pandas Library

Pandas Describe : describe()

Syntax

Example 1: describing a series

Example 3: Describing dataframe

Pandas head : head()

Syntax

Example 1: Simple example of head() function

Example 2: providing value of ‘n’

Example 3: using tail function

Pandas unique : unique()

Syntax

Example 1: using pandas unique() over series object

Example 2: unique function on categorical data

Pandas Count : count()

Syntax

Example 1: counting non-NA values

Example 2: applying count() function over columns

Conclusion

Leave a Reply Cancel reply

Latest Posts

Follow US

	Employee	Age	Married_Status
0	Rakesh	27.0	False
1	Ramesh	36.0	True
2	Suresh	30.0	False
3	Jayesh	NaN	True
4	Bhavesh	23.0	False

	Students
0	Jack
1	Dale
2	Shaun
3	Shane
4	Brett
5	Patrick
6	Mitchell
7	David
8	Zoe

	Students
0	Jack
1	Dale
2	Shaun
3	Shane
4	Brett

	Students
0	Jack
1	Dale
2	Shaun

	Students
4	Brett
5	Patrick
6	Mitchell
7	David
8	Zoe