Pandas Visualization Tutorial – Bar Plot, Histogram, Scatter Plot, Pie Chart

Pandas Visualization Tutorial – Pandas Bar Plot, Pandas Histogram, Pandas Scatter Plot, Pandas Pie Plot
Pandas Visualization Tutorial – Pandas Bar Plot, Pandas Histogram, Pandas Scatter Plot, Pandas Pie Plot

Introduction

During the data exploratory exercise in your machine learning or data science project, it is always useful to understand data with the help of visualizations. Python Pandas library offers basic support for various types of visualizations. In this article, we will explore the following pandas visualization functions – bar plot, histogram, box plot, scatter plot, and pie chart. We will learn its syntax of each visualization and see its multiple variations.

Importing Pandas Library

To begin this article, first import pandas library with an alias as pd.

In [1]:
import pandas as pd
import numpy as np

We will start this tutorial by plotting the bar graph.

Pandas Bar Plot : bar()

Bar Plot is used to represent categorical data in the form of vertical and horizontal bars, where the lengths of these bars are proportional to the values they contain.

Syntax

dataframe.plot.bar(x=None, y=None, kwargs)

Ad
Deep Learning Specialization on Coursera

x : label or position(optional) – This helps in plotting of one column against another column.

y : label or position(optional) – This helps in plotting of one column against another column.

kwargs – This parameter is used to point towards some extra keyword arguments used in the function.

output – The final output is either in the form of a plot visualized with the help of matplotlib or otherwise, we may get numpy array as output.

Example 1: Simple pandas bar plot

Now let’s look at examples of bar plot.

Here a dataframe df is created in which two different values are stored, it is then visualized using bar function.

In [2]:
df = pd.DataFrame({'label':['P', 'Q', 'R'], 'values':[70, 25, 97]})
df
Out[2]:
label values
0 P 70
1 Q 25
2 R 97

Here x-axis is provided with labels and y-axis with values. The rot or rotation parameter is used for rotating the x-axis labels to some degrees.

In [3]:
ax = df.plot.bar(x='label', y='values', rot=0)
ax
Out[3]:
<matplotlib.axes._subplots.AxesSubplot at 0x21085a1fb38>
pandas bar plot
pandas bar plot

Example 2: Visualizing information with two different bar plots in one axes

In [4]:

height = [5, 17.5, 40, 48, 52, 69, 88]

years_to_fully_grow = [2, 8, 70, 1.5, 25, 12, 28]

index = ['lemon', 'cotton', 'neem',

         'hibscus', 'peepal', 'banyan', 'coconut']

df = pd.DataFrame({'height (in metres)': height, 
                   'Years taken to grow': years_to_fully_grow}, index=index)

df
Out[4]:
height (in metres) Years taken to grow
lemon 5.0 2.0
cotton 17.5 8.0
neem 40.0 70.0
hibscus 48.0 1.5
peepal 52.0 25.0
banyan 69.0 12.0
coconut 88.0 28.0

Here in this plot, we can see there are two bar plots along with legend at the top left corner of the plot, providing information about axes.

In [5]:
ax = df.plot.bar(rot=0)
pandas bar plot
pandas bar plot
Example 3: Representing information with two different bar plots

The plot shown above can be divided into two different bar plots, conveying the same information. Let us see how it can be achieved. Here we can see that by assigning subplots a value as true has provided this result.

So whenever we want to express information where two different features are present, then we can use bar plot of pandas.

In [6]:
axes = df.plot.bar(rot=0, subplots=True)
pandas bar plot
pandas bar plot

Example 4: Multiple Bar plot

Here multiple bars are plotted. These can be stacked as well, for that we will use the parameter stacked.

In [7]:
df_bar = pd.DataFrame(np.random.rand(20, 4), columns=['A', 'B', 'C', 'D'])
In [8]:
df_bar.head()
Out[8]:
A B C D
0 0.669276 0.338299 0.962047 0.653750
1 0.066810 0.931032 0.898166 0.106958
2 0.173606 0.371832 0.477262 0.633449
3 0.137026 0.693457 0.374763 0.810055
4 0.644451 0.101267 0.733968 0.092187
In [9]:
df_bar.plot.bar()
Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x21086f98e10>
pandas bar plot
pandas bar plot

So the same information is conveyed through both these plots, only they are now stacked over one another.

In [10]:
df_bar.plot.bar(stacked=True)
Out[10]:
<matplotlib.axes._subplots.AxesSubplot at 0x210870f6080>
pandas bar plot
pandas bar plot

To build these bar plots horizontally, there is a slight variation in the function. Let’s look at it. So the bar() is changed to barh().

In [11]:
df_bar.plot.barh(stacked=True)
Out[11]:
<matplotlib.axes._subplots.AxesSubplot at 0x21087225048>

Moving onto the next plot type, let’s plot histogram

Pandas Histogram : hist()

Histogram is useful to provide insights on the data distribution. Below we will understand syntax of histogram.

Syntax

dataframe.hist(data, column=None, bins=10, kwargs)

data : Dataframe – This is the dataframe which holds the data.

column : str or sequence – For limiting data to subset of columns

bins : int or sequence, default is 10 – This tells us the number of histogram bins to be used. If an integer is given, bins + 1 bin edges are calculated and returned. If bins is a sequence, gives bin edges, including left edge of first bin and right edge of last bin. These bins helps in building precise histograms.

kwargs – This parameter is used to point towards some extra keyword arguments used in the function. For looking at some other parameters, you can go here

Example 1: Simple pandas histogram plot

Here a dataframe is created by generating random values with the help of numpy.

In [12]:
df_hist = pd.DataFrame({'A': np.random.randn(3000) + 1, 'B': np.random.randn(3000),'C': np.random.randn(3000) - 1}, columns=['A', 'B', 'C'])
In [13]:
df_hist.head()
Out[13]:
A B C
0 0.295822 -0.331693 -2.704511
1 1.676880 -2.079090 -1.421252
2 1.080180 -1.345613 -1.374851
3 0.353748 -2.809398 -1.817229
4 1.406088 -1.687410 -2.583514

Using hist() function, histogram was built. Here a parameter called ‘alpha’ is used to bring transparency in the plot. With the increase in the value of alpha, transparency will decrease and vice versa.

In [14]:
df_hist.plot.hist(alpha=0.6)
Out[14]:
<matplotlib.axes._subplots.AxesSubplot at 0x21089b67278>
pandas histogram
pandas histogram

Example 2: Stacked Histogram with bins parameter

In [15]:
df_hist.plot.hist(stacked=True, bins=30,alpha=0.5)
Out[15]:
<matplotlib.axes._subplots.AxesSubplot at 0x21089ae0da0>
pandas histogram
pandas histogram
Example 3: Horizontal Cumulative Histogram

Now we will learn how can we make a horizontal histogram with the cumulative parameter set to true. The cumulative property ensures there is a continuous histogram and for a horizontal histogram, orientation parameter is used.

In [16]:
df_hist['B'].plot.hist(orientation='horizontal', cumulative=True)
Out[16]:
<matplotlib.axes._subplots.AxesSubplot at 0x21089d802e8>
pandas histogram
pandas histogram

After looking at bars, we will explore a different type of plot i.e. scatter plot

Pandas Scatter Plot : scatter()

Scatter plot is used to depict the correlation between two variables by plotting them over axes.

Syntax

dataframe.plot.scatter(x, y, s=None, c=None, kwargs)

x : int or str – The column used for horizontal coordinates.

y : int or str – The column used for vertical coordinates.

s : scalar or array_like(optional) – The size of each point.

c : str, int or array_like(optional) – The color of each point.

kwargs : This parameter is used to point towards some extra keyword arguments used in the function.

Example 1: Simple Pandas Scatter plot

Here data and column information is provided to scatter function.

In [17]:
df_scatter = pd.DataFrame(np.random.rand(100, 5), columns=['a', 'b', 'c', 'd','e'])
df_scatter.head()
Out[17]:
a b c d e
0 0.232764 0.297948 0.203266 0.647538 0.429361
1 0.915141 0.940809 0.710723 0.871370 0.191295
2 0.195358 0.506211 0.812928 0.895210 0.811241
3 0.836417 0.198523 0.508496 0.137529 0.567117
4 0.630026 0.080692 0.884420 0.299120 0.585431

By assigning two columns i.e. a and b are assigned to respective axes for visualizing scatter plot.

In [18]:
df_scatter.plot.scatter(x='a', y='b');
pandas scatter plot
pandas scatter plot

Example 2: Using color and label parameters

In [19]:
ax = df_scatter.plot.scatter(x='a', y='b', color='DarkRed', label='Length');

df_scatter.plot.scatter(x='c', y='d', color='DarkOrange', label='Width', ax=ax);
pandas scatter plot
pandas scatter plot
Example 3: Using colormap parameter to differentiate scatter points

Here colormap parameter is used to differentiate the scatter points. On the scale we can see that higher value is contained in points which are of yellow color. This helps in representing multiple points.

In [20]:
ax2 = df_scatter.plot.scatter(x='a',

                      y='b',

                      c='c',

                      colormap='viridis')
pandas scatter plot
pandas scatter plot

The very famous pie chart can be built using pandas. So let’s look at pie chart and learn about its details.

Pandas Pie Chart: pie()

Pie chart is a very useful graph which can be used to represent proportional information.

Syntax

Dataframe.plot.pie(y,kwargs)

y : int or label(optional) – This is the label or position used for plotting the pie plot.

kwargs – Keyword arguments which can be passed to the function.

Example 1: Simple pandas pie chart

In this example, a series is built using pandas. Using this series, we will plot a pie chart which tells us which fruit is consumed the most in India. For assigning the values to each entry, we are using numpy random function.

In [21]:
series = pd.Series(3 * np.random.rand(4),index=['Apple', 'Banana', 'Coconut', 'Watermelon'], name='Fruits_Consumption_in_India')
In [22]:
series
Out[22]:
Apple         2.166653
Banana        1.605650
Coconut       2.865410
Watermelon    1.809250
Name: Fruits_Consumption_in_India, dtype: float64
In [23]:
series.plot.pie(figsize=(8, 8))
Out[23]:
<matplotlib.axes._subplots.AxesSubplot at 0x21085866828>
pandas pie chart
pandas pie chart

Example 2: Values in pie chart

Here with the autopct parameter set as %.2f (float value), we were able to get the values of each item in the pie chart.

In [24]:
series.plot.pie(labels=['Apple', 'Banana', 'Coconut', 'Watermelon'], colors=['r', 'y', 'b', 'g'],autopct='%.2f', fontsize=15, figsize=(7, 7))
Out[24]:
<matplotlib.axes._subplots.AxesSubplot at 0x2108b36e4e0>
pandas pie chart
pandas pie chart

Conclusion

Reaching the end of this tutorial, we learned how we can build various kinds of plots like bar plot, histogram, scatter plot and pie chart using in-built functions of pandas visualization libraries. With the help of syntax and examples, we got deeper understanding of these interactive plots. Along with this, we looked at different areas where these plots are useful for conveying information.

Reference – https://pandas.pydata.org/docs/

Like and Comment section (Community Members)

Create Your ML Profile!

Don't miss out to join exclusive Machine Learning community

Comments

No comments yet