Introduction
As an initial step, in machine learning or data science projects, we carry out data exploration to understand our data. If we are handling the data with the help of pandas library, we have the advantage of exploring our data easily by using pandas functions such as describe(), head(), unique() and count(). In this article, we will look at these functions and learn how they can be used for data exploration with some examples.
Importing Pandas Library
We will be starting this tutorial by importing pandas library.
import pandas as pd
import numpy as np
Starting this article with pandas describe function.
Pandas Describe : describe()
The describe() function is used for generating descriptive statistics of a dataset.
This pandas function provides the dataset’s information about central tendency, data dispersion, and shape of a dataset.
Syntax
pandas.DataFrame.describe(self,percentiles,include,exclude)
self : DataFrame or Series – This is the dataframe or series which is passed to describe() function for finding its descriptive statistics.
percentiles : listlike of numbers – Here we provide the desired percentiles which should be included in the output. The default values are 0.25,0.5 and 0.75 i.e. 25th percentile, 50th percentile and 75th percentile. All the values should be between 0 and 1.
include : listlike of dtypes or None(optional) – This is the acceptable list of data types that can be included in the output.
exclude : listlike of dtypes or None(optional) – This is the list of data types which should not be included in the output.**
As an output, we get summarized statistics of series or dataframe.
Example 1: describing a series
Here we will apply describe() function over a series.
s = pd.Series([7, 9, 11])
s
0 7 1 9 2 11 dtype: int64
As we can see, we have obtained different descriptive statistics parameter such as count, mean, std i.e. standard deviation and many more.
s.describe()
count 3.0 mean 9.0 std 2.0 min 7.0 25% 8.0 50% 9.0 75% 10.0 max 11.0 dtype: float64
Pandas describe() function can be used over categorical data as well.
s = pd.Series(['P', 'P', 'Q', 'R'])
s
0 P 1 P 2 Q 3 R dtype: object
The pandas describe() can help in describing categorical data i.e. text data.
s.describe()
count 4 unique 3 top P freq 2 dtype: object
Example 3: Describing dataframe
As we mostly deal with dataframes, let’s see how they are described using pandas describe() function.
df = pd.DataFrame({'categorical': pd.Categorical(['A','B','C']),
'numeric': [3, 6, 9],
'object': ['P', 'Q', 'R']
})
df
categorical  numeric  object  

0  A  3  P 
1  B  6  Q 
2  C  9  R 
In this example, the numeric data is described.
df.describe()
numeric  

count  3.0 
mean  6.0 
std  3.0 
min  3.0 
25%  4.5 
50%  6.0 
75%  7.5 
max  9.0 
By using include parameter, we can get the descriptive statistics for each data type present in dataframe.
df.describe(include='all')
categorical  numeric  object  

count  3  3.0  3 
unique  3  NaN  3 
top  C  NaN  R 
freq  1  NaN  1 
mean  NaN  6.0  NaN 
std  NaN  3.0  NaN 
min  NaN  3.0  NaN 
25%  NaN  4.5  NaN 
50%  NaN  6.0  NaN 
75%  NaN  7.5  NaN 
max  NaN  9.0  NaN 
The next function in the list is pandas head function
Pandas head : head()
The head() returns the first n rows of an object. It helps in knowing the data and datatype of the object.
Syntax
pandas.DataFrame.head(n=5)
n : int(default = 5) – This provides information about the number of rows which will be returned.
The head function returns the object with the desired number of rows.
Example 1: Simple example of head() function
In this example, we will look at how head function returns a sample of dataframe with ‘n’ number of rows.
stud = pd.DataFrame({'Students': ['Jack', 'Dale', 'Shaun', 'Shane',
'Brett', 'Patrick', 'Mitchell', 'David', 'Zoe']})
stud
Students  

0  Jack 
1  Dale 
2  Shaun 
3  Shane 
4  Brett 
5  Patrick 
6  Mitchell 
7  David 
8  Zoe 
stud.head()
Students  

0  Jack 
1  Dale 
2  Shaun 
3  Shane 
4  Brett 
Example 2: providing value of ‘n’
As we know, we can provide the value of ‘n’. So in this example, we will be providing value of ‘n’.
Since we provided the value of ‘n’ as ‘3’, we get three rows in the output.
stud.head(3)
Students  

0  Jack 
1  Dale 
2  Shaun 
Example 3: using tail function
For accessing the dataframe’s ending values, we will use tail() function. By default, we will get the last 5 values of dataframe.
stud.tail()
Students  

4  Brett 
5  Patrick 
6  Mitchell 
7  David 
8  Zoe 
The third function in the list is pandas unique function.
[adrotate banner=”3″]
Pandas unique : unique()
The unique() function returns unique values present in series object. The values are returned in the order of appearance.
Syntax
series.unqiue()
Here the unique function is applied over series object and then the unique values are returned.
The output of this function is an array.
Example 1: using pandas unique() over series object
In the belowgiven example, we will be applying unique() function on the series object.
In the output, we get an array with unique values.
pd.Series([7, 14, 9, 9], name='Test').unique()
array([ 7, 14, 9], dtype=int64)
Example 2: unique function on categorical data
As mentioned earlier, categorical data is text data. So let’s see how the unique function operates over a series containing categorical data.
In this first categorical data, we can see that the list is divided into different categories.
pd.Series(pd.Categorical(list('gpprs'))).unique()
[g, p, r, s] Categories (4, object): [g, p, r, s]
In this example, the same categorical data is displayed in ordered form. This is because we have specified ordered keyword.
pd.Series(pd.Categorical(list('gpprs'), categories=list('gprs'),
ordered=True)).unique()
[g, p, r, s] Categories (4, object): [g < p < r < s]
The last function in this article which we’ll look at is pandas count.
Pandas Count : count()
The pandas count() function helps in counting nonNA cells of each column or row.
Syntax
pandas.DataFrame.count(axis=0,level=None,numeric_only=False)
axis : {0 or ‘index’, 1 or ‘columns’}, default 0 – If the value provided is 0, then counts are generated for each column. If value provided is 1, then counts are generated for rows.
level : int or str(optional) – It is used to specify the level along which counting should be done. Generally used for hierarchical i.e. multiindex dataframes.
numeric_only : bool – For specifying which kind of data, i.e. either float, int or boolean data.
The output is a Series or DataFrame. For each column/row, the nonNA entries are counted.
Example 1: counting nonNA values
Here a dataframe is created with the help of a dictionary.
df = pd.DataFrame({"Employee":
["Rakesh", "Ramesh", "Suresh", "Jayesh", "Bhavesh"],
"Age": [27, 36, 30, np.nan, 23],
"Married_Status": [False, True, False, True, False]})
df
Employee  Age  Married_Status  

0  Rakesh  27.0  False 
1  Ramesh  36.0  True 
2  Suresh  30.0  False 
3  Jayesh  NaN  True 
4  Bhavesh  23.0  False 
The below output shows the results of count() function.
df.count()
Employee 5 Age 4 Married_Status 5 dtype: int64
Example 2: applying count() function over columns
In this count() function example, we have applied count function over axis of columns. This is the reason for 3rd index the count is 2 as compared to other columns where 3 values are present.
df.count(axis='columns')
0 3 1 3 2 3 3 2 4 3 dtype: int64
Conclusion
Now it’s time to end this article, in this tutorial we covered four different pandas functions which are beneficial to use when we want to understand and explore our data for data preprocessing operations and for taking crucial decisions using this data. The functions which we covered are describe(),head(),unique() and count(). These are some useful pandas functions applied over dataframes for understanding our data stored in it.
Reference – https://pandas.pydata.org/docs/

I am Palash Sharma, an undergraduate student who loves to explore and garner indepth knowledge in the fields like Artificial Intelligence and Machine Learning. I am captivated by the wonders these fields have produced with their novel implementations. With this, I have a desire to share my knowledge with others in all my capacity.
View all posts