Contents

## Introduction

During the data exploratory exercise in your machine learning or data science project, it is always useful to understand data with the help of visualizations. Python Pandas library offers basic support for various types of visualizations. In this article, we will explore the following pandas visualization functions – **bar plot**, **histogram**, **box plot**, **scatter plot**, and **pie chart**. We will learn its syntax of each visualization and see its multiple variations.

### Importing Pandas Library

To begin this article, first import pandas library with an alias as **pd**.

```
import pandas as pd
import numpy as np
```

We will start this tutorial by plotting the **bar** graph.

**Pandas Bar Plot : bar()**

Bar Plot is used to represent categorical data in the form of vertical and horizontal bars, where the lengths of these bars are proportional to the values they contain.

### Syntax

**dataframe.plot.bar(x=None, y=None, kwargs)**

**x : label or position**(optional) – This helps in plotting of one column against another column.

**y : label or position**(optional) – This helps in plotting of one column against another column.

**kwargs** – This parameter is used to point towards some extra keyword arguments used in the function.

**output** – The final output is either in the form of a plot visualized with the help of matplotlib or otherwise, we may get numpy array as output.

### Example 1: Simple pandas bar plot

Now let’s look at examples of bar plot.

Here a dataframe **df** is created in which two different values are stored, it is then visualized using **bar function**.

```
df = pd.DataFrame({'label':['P', 'Q', 'R'], 'values':[70, 25, 97]})
df
```

Here **x-axis** is provided with **labels** and **y-axis** with **values**. The **rot** or **rotation** parameter is used for rotating the **x-axis** labels to some degrees.

```
ax = df.plot.bar(x='label', y='values', rot=0)
ax
```

### Example 2: Visualizing information with two different bar plots in one axes

In [4]:

```
height = [5, 17.5, 40, 48, 52, 69, 88]
years_to_fully_grow = [2, 8, 70, 1.5, 25, 12, 28]
index = ['lemon', 'cotton', 'neem',
'hibscus', 'peepal', 'banyan', 'coconut']
df = pd.DataFrame({'height (in metres)': height,
'Years taken to grow': years_to_fully_grow}, index=index)
df
```

Here in this plot, we can see there are two bar plots along with legend at the top left corner of the plot, providing information about axes.

```
ax = df.plot.bar(rot=0)
```

The plot shown above can be divided into two different bar plots, conveying the same information. Let us see how it can be achieved. Here we can see that by assigning **subplots** a value as **true** has provided this result.

So whenever we want to express information where two different features are present, then we can use **bar plot** of pandas.

```
axes = df.plot.bar(rot=0, subplots=True)
```

### Example 4: Multiple Bar plot

Here multiple bars are plotted. These can be stacked as well, for that we will use the parameter **stacked**.

```
df_bar = pd.DataFrame(np.random.rand(20, 4), columns=['A', 'B', 'C', 'D'])
```

```
df_bar.head()
```

```
df_bar.plot.bar()
```

So the same information is conveyed through both these plots, only they are now stacked over one another.

```
df_bar.plot.bar(stacked=True)
```

To build these bar plots horizontally, there is a slight variation in the function. Let’s look at it. So the **bar()** is changed to **barh()**.

```
df_bar.plot.barh(stacked=True)
```

Moving onto the next plot type, let’s plot **histogram**

**Pandas Histogram : hist()**

Histogram is useful to provide insights on the **data distribution**. Below we will understand syntax of histogram.

### Syntax

**dataframe.hist(data, column=None, bins=10, kwargs)**

**data : Dataframe** – This is the dataframe which holds the data.

**column : str or sequence** – For limiting data to subset of columns

**bins : int or sequence, default is 10** – This tells us the number of histogram bins to be used. If an integer is given, bins + 1 bin edges are calculated and returned. If bins is a sequence, gives bin edges, including left edge of first bin and right edge of last bin. These bins helps in building precise histograms.

**kwargs** – This parameter is used to point towards some extra keyword arguments used in the function. For looking at some other parameters, you can go here

### Example 1: Simple pandas histogram plot

Here a dataframe is created by generating random values with the help of numpy.

```
df_hist = pd.DataFrame({'A': np.random.randn(3000) + 1, 'B': np.random.randn(3000),'C': np.random.randn(3000) - 1}, columns=['A', 'B', 'C'])
```

```
df_hist.head()
```

Using **hist()** function, histogram was built. Here a parameter called **‘alpha’** is used to bring **transparency** in the plot. With the increase in the value of alpha, transparency will decrease and vice versa.

```
df_hist.plot.hist(alpha=0.6)
```

### Example 2: Stacked Histogram with bins parameter

```
df_hist.plot.hist(stacked=True, bins=30,alpha=0.5)
```

Now we will learn how can we make a horizontal histogram with the cumulative parameter set to **true**. The cumulative property ensures there is a continuous histogram and for a horizontal histogram, **orientation** parameter is used.

```
df_hist['B'].plot.hist(orientation='horizontal', cumulative=True)
```

After looking at bars, we will explore a different type of plot i.e. **scatter plot**

**Pandas Scatter Plot : scatter()**

Scatter plot is used to depict the correlation between two variables by plotting them over axes.

### Syntax

**dataframe.plot.scatter(x, y, s=None, c=None, kwargs)**

**x : int or str** – The column used for horizontal coordinates.

**y : int or str** – The column used for vertical coordinates.

**s : scalar or array_like**(optional) – The size of each point.

**c : str, int or array_like**(optional) – The color of each point.

**kwargs** : This parameter is used to point towards some extra keyword arguments used in the function.

### Example 1: Simple Pandas Scatter plot

Here data and column information is provided to scatter function.

```
df_scatter = pd.DataFrame(np.random.rand(100, 5), columns=['a', 'b', 'c', 'd','e'])
df_scatter.head()
```

By assigning two columns i.e. **a** and **b** are assigned to respective axes for visualizing scatter plot.

```
df_scatter.plot.scatter(x='a', y='b');
```

### Example 2: Using color and label parameters

```
ax = df_scatter.plot.scatter(x='a', y='b', color='DarkRed', label='Length');
df_scatter.plot.scatter(x='c', y='d', color='DarkOrange', label='Width', ax=ax);
```

Here colormap parameter is used to differentiate the scatter points. On the scale we can see that higher value is contained in points which are of yellow color. This helps in representing multiple points.

```
ax2 = df_scatter.plot.scatter(x='a',
y='b',
c='c',
colormap='viridis')
```

The very famous pie chart can be built using pandas. So let’s look at **pie chart **and learn about its details.

**Pandas Pie Chart: pie()**

Pie chart is a very useful graph which can be used to represent proportional information.

### Syntax

**Dataframe.plot.pie(y,kwargs)**

**y : int or label**(optional) – This is the label or position used for plotting the pie plot.

**kwargs** – Keyword arguments which can be passed to the function.

### Example 1: Simple pandas pie chart

In this example, a series is built using pandas. Using this series, we will plot a pie chart which tells us which fruit is consumed the most in India. For assigning the values to each entry, we are using numpy random function.

```
series = pd.Series(3 * np.random.rand(4),index=['Apple', 'Banana', 'Coconut', 'Watermelon'], name='Fruits_Consumption_in_India')
```

```
series
```

```
series.plot.pie(figsize=(8, 8))
```

### Example 2: Values in pie chart

Here with the **autopct** parameter set as **%.2f (float value)**, we were able to get the values of each item in the pie chart.

```
series.plot.pie(labels=['Apple', 'Banana', 'Coconut', 'Watermelon'], colors=['r', 'y', 'b', 'g'],autopct='%.2f', fontsize=15, figsize=(7, 7))
```

## Conclusion

Reaching the end of this tutorial, we learned how we can build various kinds of plots like **bar plot**, **histogram**, **scatter plot** and **pie chart **using in-built functions of pandas visualization libraries. With the help of syntax and examples, we got deeper understanding of these interactive plots. Along with this, we looked at different areas where these plots are useful for conveying information.

*Reference –* https://pandas.pydata.org/docs/

## Like and Comment section (Community Members)

## Create Your ML Profile!

Don't miss out to join exclusive Machine Learning community

## Comments