Pandas Tutorial – describe(), head(), unique() and count()

Introduction

 As an initial step, in machine learning or data science projects, we carry out data exploration to understand our data. If we are handling the data with the help of pandas library, we have the advantage of exploring our data easily by using pandas functions such as describe(), head(), unique() and count(). In this article, we will look at these functions and learn how they can be used for data exploration with some examples.

Importing Pandas Library

We will be starting this tutorial by importing pandas library.

In [1]:
import pandas as pd
import numpy as np

Starting this article with pandas describe function.

Pandas Describe : describe()

The describe() function is used for generating descriptive statistics of a dataset.

This pandas function provides the dataset’s information about central tendency, data dispersion, and shape of a dataset.

Syntax

pandas.DataFrame.describe(self,percentiles,include,exclude)

self : DataFrame or Series – This is the dataframe or series which is passed to describe() function for finding its descriptive statistics.

percentiles : list-like of numbers – Here we provide the desired percentiles which should be included in the output. The default values are 0.25,0.5 and 0.75 i.e. 25th percentile, 50th percentile and 75th percentile. All the values should be between 0 and 1.

include : list-like of dtypes or None(optional) – This is the acceptable list of data types that can be included in the output.

exclude : list-like of dtypes or None(optional) – This is the list of data types which should not be included in the output.**

As an output, we get summarized statistics of series or dataframe.

Example 1: describing a series

Here we will apply describe() function over a series.

In [2]:
s = pd.Series([7, 9, 11])

s
Out[2]:
0     7
1     9
2    11
dtype: int64

As we can see, we have obtained different descriptive statistics parameter such as count, mean, std i.e. standard deviation and many more.

In [3]:
s.describe()
Out[3]:
count     3.0
mean      9.0
std       2.0
min       7.0
25%       8.0
50%       9.0
75%      10.0
max      11.0
dtype: float64
Example 2: describing categorical data

Pandas describe() function can be used over categorical data as well.

In [4]:
s = pd.Series(['P', 'P', 'Q', 'R'])

s
Out[4]:
0    P
1    P
2    Q
3    R
dtype: object

The pandas describe() can help in describing categorical data i.e. text data.

In [5]:
s.describe()
Out[5]:
count     4
unique    3
top       P
freq      2
dtype: object

Example 3: Describing dataframe

As we mostly deal with dataframes, let’s see how they are described using pandas describe() function.

In [6]:
df = pd.DataFrame({'categorical': pd.Categorical(['A','B','C']),
                   'numeric': [3, 6, 9],
                   'object': ['P', 'Q', 'R']
                   })

df
Out[6]:
categorical numeric object
0 A 3 P
1 B 6 Q
2 C 9 R

In this example, the numeric data is described.

In [7]:
df.describe()
Out[7]:
numeric
count 3.0
mean 6.0
std 3.0
min 3.0
25% 4.5
50% 6.0
75% 7.5
max 9.0

By using include parameter, we can get the descriptive statistics for each data type present in dataframe.

In [8]:
df.describe(include='all')
Out[8]:
categorical numeric object
count 3 3.0 3
unique 3 NaN 3
top C NaN R
freq 1 NaN 1
mean NaN 6.0 NaN
std NaN 3.0 NaN
min NaN 3.0 NaN
25% NaN 4.5 NaN
50% NaN 6.0 NaN
75% NaN 7.5 NaN
max NaN 9.0 NaN

The next function in the list is pandas head function

Pandas head : head()

The head() returns the first n rows of an object. It helps in knowing the data and datatype of the object.

Syntax

pandas.DataFrame.head(n=5)

n : int(default = 5) – This provides information about the number of rows which will be returned.

The head function returns the object with the desired number of rows.

Example 1: Simple example of head() function

In this example, we will look at how head function returns a sample of dataframe with ‘n’ number of rows.

In [9]:
stud = pd.DataFrame({'Students': ['Jack', 'Dale', 'Shaun', 'Shane',
                    'Brett', 'Patrick', 'Mitchell', 'David', 'Zoe']})

stud
Out[9]:
Students
0 Jack
1 Dale
2 Shaun
3 Shane
4 Brett
5 Patrick
6 Mitchell
7 David
8 Zoe
In [10]:
stud.head()
Out[10]:
Students
0 Jack
1 Dale
2 Shaun
3 Shane
4 Brett

Example 2: providing value of ‘n’

As we know, we can provide the value of ‘n’. So in this example, we will be providing value of ‘n’.

Since we provided the value of ‘n’ as ‘3’, we get three rows in the output.

In [11]:
stud.head(3)
Out[11]:
Students
0 Jack
1 Dale
2 Shaun

Example 3: using tail function

For accessing the dataframe’s ending values, we will use tail() function. By default, we will get the last 5 values of dataframe.

In [12]:
stud.tail()
Out[12]:
Students
4 Brett
5 Patrick
6 Mitchell
7 David
8 Zoe

The third function in the list is pandas unique function.

[adrotate banner=”3″]

Pandas unique : unique()

The unique() function returns unique values present in series object. The values are returned in the order of appearance.

Syntax

series.unqiue()

Here the unique function is applied over series object and then the unique values are returned.

The output of this function is an array.

Example 1: using pandas unique() over series object

In the below-given example, we will be applying unique() function on the series object.

In the output, we get an array with unique values.

In [13]:
pd.Series([7, 14, 9, 9], name='Test').unique()
Out[13]:
array([ 7, 14,  9], dtype=int64)

Example 2: unique function on categorical data

As mentioned earlier, categorical data is text data. So let’s see how the unique function operates over a series containing categorical data.

In this first categorical data, we can see that the list is divided into different categories.

In [14]:
pd.Series(pd.Categorical(list('gpprs'))).unique()
Out[14]:
[g, p, r, s]
Categories (4, object): [g, p, r, s]

In this example, the same categorical data is displayed in ordered form. This is because we have specified ordered keyword.

In [15]:
pd.Series(pd.Categorical(list('gpprs'), categories=list('gprs'),
                        ordered=True)).unique()
Out[15]:
[g, p, r, s]
Categories (4, object): [g < p < r < s]

The last function in this article which we’ll look at is pandas count.

Pandas Count : count()

The pandas count() function helps in counting non-NA cells of each column or row.

Syntax

pandas.DataFrame.count(axis=0,level=None,numeric_only=False)

axis : {0 or ‘index’, 1 or ‘columns’}, default 0 – If the value provided is 0, then counts are generated for each column. If value provided is 1, then counts are generated for rows.

level : int or str(optional) – It is used to specify the level along which counting should be done. Generally used for hierarchical i.e. multi-index dataframes.

numeric_only : bool – For specifying which kind of data, i.e. either float, int or boolean data.

The output is a Series or DataFrame. For each column/row, the non-NA entries are counted.

Example 1: counting non-NA values

Here a dataframe is created with the help of a dictionary.

In [16]:
df = pd.DataFrame({"Employee":
                    ["Rakesh", "Ramesh", "Suresh", "Jayesh", "Bhavesh"],
                    "Age": [27, 36, 30, np.nan, 23],
                   "Married_Status": [False, True, False, True, False]})

df
Out[16]:
Employee Age Married_Status
0 Rakesh 27.0 False
1 Ramesh 36.0 True
2 Suresh 30.0 False
3 Jayesh NaN True
4 Bhavesh 23.0 False

The below output shows the results of count() function.

In [17]:
df.count()
Out[17]:
Employee          5
Age               4
Married_Status    5
dtype: int64

Example 2: applying count() function over columns

In this count() function example, we have applied count function over axis of columns. This is the reason for 3rd index the count is 2 as compared to other columns where 3 values are present.

In [18]:
df.count(axis='columns')
Out[18]:
0    3
1    3
2    3
3    2
4    3
dtype: int64

Conclusion

Now it’s time to end this article, in this tutorial we covered four different pandas functions which are beneficial to use when we want to understand and explore our data for data preprocessing operations and for taking crucial decisions using this data. The functions which we covered are describe(),head(),unique() and count(). These are some useful pandas functions applied over dataframes for understanding our data stored in it.

Reference – https://pandas.pydata.org/docs/

  • Palash Sharma

    I am Palash Sharma, an undergraduate student who loves to explore and garner in-depth knowledge in the fields like Artificial Intelligence and Machine Learning. I am captivated by the wonders these fields have produced with their novel implementations. With this, I have a desire to share my knowledge with others in all my capacity.

    View all posts

Follow Us

Leave a Reply

Your email address will not be published. Required fields are marked *