## Introduction

As an initial step, in machine learning or data science projects, we carry out data exploration to understand our data. If we are handling the data with the help of pandas library, we have the advantage of exploring our data easily by using pandas functions such as **describe(), head(), unique() and count()**. In this article, we will look at these functions and learn how they can be used for data exploration with some examples.

### Importing Pandas Library

We will be starting this tutorial by importing pandas library.

```
import pandas as pd
import numpy as np
```

Starting this article with **pandas describe** function.

**Pandas Describe : describe()**

The describe() function is used for generating **descriptive statistics** of a dataset.

This pandas function provides the dataset’s information about central tendency, data dispersion, and shape of a dataset.

### Syntax

**pandas.DataFrame.describe(self,percentiles,include,exclude)**

**self : DataFrame or Series** – This is the dataframe or series which is passed to **describe() function** for finding its descriptive statistics.

**percentiles : list-like of numbers** – Here we provide the desired percentiles which should be included in the output. The default values are **0.25,0.5 and 0.75** i.e. **25th percentile, 50th percentile and 75th percentile**. All the values should be between 0 and 1.

**include : list-like of dtypes or None**(optional) – This is the acceptable list of data types that can be included in the output.

**exclude : list-like of dtypes or None**(optional) – This is the list of data types which should not be included in the output.**

As an output, we get summarized statistics of series or dataframe.

### Example 1: describing a series

Here we will apply describe() function over a series.

```
s = pd.Series([7, 9, 11])
s
```

0 7 1 9 2 11 dtype: int64

As we can see, we have obtained different descriptive statistics parameter such as count, mean, std i.e. standard deviation and many more.

```
s.describe()
```

count 3.0 mean 9.0 std 2.0 min 7.0 25% 8.0 50% 9.0 75% 10.0 max 11.0 dtype: float64

Pandas describe() function can be used over categorical data as well.

```
s = pd.Series(['P', 'P', 'Q', 'R'])
s
```

0 P 1 P 2 Q 3 R dtype: object

The pandas describe() can help in describing categorical data i.e. text data.

```
s.describe()
```

count 4 unique 3 top P freq 2 dtype: object

### Example 3: Describing dataframe

As we mostly deal with dataframes, let’s see how they are described using pandas describe() function.

```
df = pd.DataFrame({'categorical': pd.Categorical(['A','B','C']),
'numeric': [3, 6, 9],
'object': ['P', 'Q', 'R']
})
df
```

categorical | numeric | object | |
---|---|---|---|

0 | A | 3 | P |

1 | B | 6 | Q |

2 | C | 9 | R |

In this example, the numeric data is described.

```
df.describe()
```

numeric | |
---|---|

count | 3.0 |

mean | 6.0 |

std | 3.0 |

min | 3.0 |

25% | 4.5 |

50% | 6.0 |

75% | 7.5 |

max | 9.0 |

By using **include** parameter, we can get the descriptive statistics for each data type present in dataframe.

```
df.describe(include='all')
```

categorical | numeric | object | |
---|---|---|---|

count | 3 | 3.0 | 3 |

unique | 3 | NaN | 3 |

top | C | NaN | R |

freq | 1 | NaN | 1 |

mean | NaN | 6.0 | NaN |

std | NaN | 3.0 | NaN |

min | NaN | 3.0 | NaN |

25% | NaN | 4.5 | NaN |

50% | NaN | 6.0 | NaN |

75% | NaN | 7.5 | NaN |

max | NaN | 9.0 | NaN |

The next function in the list is pandas head function

**Pandas head : head()**

The **head()** returns the first **n** rows of an object. It helps in knowing the data and datatype of the object.

### Syntax

**pandas.DataFrame.head(n=5)**

**n : int**(default = 5) – This provides information about the number of rows which will be returned.

The head function returns the object with the desired number of rows.

### Example 1: Simple example of head() function

In this example, we will look at how head function returns a sample of dataframe with ‘n’ number of rows.

```
stud = pd.DataFrame({'Students': ['Jack', 'Dale', 'Shaun', 'Shane',
'Brett', 'Patrick', 'Mitchell', 'David', 'Zoe']})
stud
```

Students | |
---|---|

0 | Jack |

1 | Dale |

2 | Shaun |

3 | Shane |

4 | Brett |

5 | Patrick |

6 | Mitchell |

7 | David |

8 | Zoe |

```
stud.head()
```

Students | |
---|---|

0 | Jack |

1 | Dale |

2 | Shaun |

3 | Shane |

4 | Brett |

### Example 2: providing value of ‘n’

As we know, we can provide the value of ‘n’. So in this example, we will be providing value of ‘n’.

Since we provided the value of ‘n’ as ‘3’, we get three rows in the output.

```
stud.head(3)
```

Students | |
---|---|

0 | Jack |

1 | Dale |

2 | Shaun |

### Example 3: using tail function

For accessing the dataframe’s ending values, we will use tail() function. By default, we will get the last 5 values of dataframe.

```
stud.tail()
```

Students | |
---|---|

4 | Brett |

5 | Patrick |

6 | Mitchell |

7 | David |

8 | Zoe |

The third function in the list is **pandas unique** function.

[adrotate banner=”3″]

**Pandas unique : unique()**

The **unique()** function returns unique values present in series object. The values are returned in the order of appearance.

### Syntax

**series.unqiue()**

Here the unique function is applied over series object and then the unique values are returned.

The output of this function is an array.

### Example 1: using pandas unique() over series object

In the below-given example, we will be applying unique() function on the series object.

In the output, we get an array with unique values.

```
pd.Series([7, 14, 9, 9], name='Test').unique()
```

array([ 7, 14, 9], dtype=int64)

### Example 2: unique function on categorical data

As mentioned earlier, categorical data is text data. So let’s see how the unique function operates over a series containing categorical data.

In this first categorical data, we can see that the list is divided into different categories.

```
pd.Series(pd.Categorical(list('gpprs'))).unique()
```

[g, p, r, s] Categories (4, object): [g, p, r, s]

In this example, the same categorical data is displayed in ordered form. This is because we have specified **ordered** keyword.

```
pd.Series(pd.Categorical(list('gpprs'), categories=list('gprs'),
ordered=True)).unique()
```

[g, p, r, s] Categories (4, object): [g < p < r < s]

The last function in this article which we’ll look at is **pandas count**.

**Pandas Count : count()**

The pandas count() function helps in counting non-NA cells of each column or row.

### Syntax

**pandas.DataFrame.count(axis=0,level=None,numeric_only=False)**

**axis : {0 or ‘index’, 1 or ‘columns’}, default 0** – If the value provided is **0**, then counts are generated for each column. If value provided is **1**, then counts are generated for rows.

**level : int or str**(optional) – It is used to specify the level along which counting should be done. Generally used for hierarchical i.e. multi-index dataframes.

**numeric_only : bool** – For specifying which kind of data, i.e. either float, int or boolean data.

The output is a Series or DataFrame. For each column/row, the non-NA entries are counted.

### Example 1: counting non-NA values

Here a dataframe is created with the help of a dictionary.

```
df = pd.DataFrame({"Employee":
["Rakesh", "Ramesh", "Suresh", "Jayesh", "Bhavesh"],
"Age": [27, 36, 30, np.nan, 23],
"Married_Status": [False, True, False, True, False]})
df
```

Employee | Age | Married_Status | |
---|---|---|---|

0 | Rakesh | 27.0 | False |

1 | Ramesh | 36.0 | True |

2 | Suresh | 30.0 | False |

3 | Jayesh | NaN | True |

4 | Bhavesh | 23.0 | False |

The below output shows the results of **count()** function.

```
df.count()
```

Employee 5 Age 4 Married_Status 5 dtype: int64

### Example 2: applying count() function over columns

In this count() function example, we have applied count function over axis of columns. This is the reason for **3rd index** the count is 2 as compared to other columns where 3 values are present.

```
df.count(axis='columns')
```

0 3 1 3 2 3 3 2 4 3 dtype: int64

## Conclusion

Now it’s time to end this article, in this tutorial we covered four different pandas functions which are beneficial to use when we want to understand and explore our data for data preprocessing operations and for taking crucial decisions using this data. The functions which we covered are **describe(),head(),unique() and count()**. These are some useful pandas functions applied over dataframes for understanding our data stored in it.

*Reference –* https://pandas.pydata.org/docs/