import pandas as pd
import numpy as np

We will start this tutorial by plotting the bar graph.

Pandas Bar Plot : bar()

Bar Plot is used to represent categorical data in the form of vertical and horizontal bars, where the lengths of these bars are proportional to the values they contain.

Syntax

dataframe.plot.bar(x=None, y=None, kwargs)

x : label or position(optional) – This helps in plotting of one column against another column.

y : label or position(optional) – This helps in plotting of one column against another column.

kwargs – This parameter is used to point towards some extra keyword arguments used in the function.

output – The final output is either in the form of a plot visualized with the help of matplotlib or otherwise, we may get numpy array as output.

Example 1: Simple pandas bar plot

Now let’s look at examples of bar plot.

Here a dataframe df is created in which two different values are stored, it is then visualized using bar function.

df = pd.DataFrame({'label':['P', 'Q', 'R'], 'values':[70, 25, 97]})
df

Here x-axis is provided with labels and y-axis with values. The rot or rotation parameter is used for rotating the x-axis labels to some degrees.

ax = df.plot.bar(x='label', y='values', rot=0)
ax

<matplotlib.axes._subplots.AxesSubplot at 0x21085a1fb38>

In [4]:

height = [5, 17.5, 40, 48, 52, 69, 88]

years_to_fully_grow = [2, 8, 70, 1.5, 25, 12, 28]

index = ['lemon', 'cotton', 'neem',

         'hibscus', 'peepal', 'banyan', 'coconut']

df = pd.DataFrame({'height (in metres)': height, 
                   'Years taken to grow': years_to_fully_grow}, index=index)

df

Here in this plot, we can see there are two bar plots along with legend at the top left corner of the plot, providing information about axes.

ax = df.plot.bar(rot=0)

The plot shown above can be divided into two different bar plots, conveying the same information. Let us see how it can be achieved. Here we can see that by assigning subplots a value as true has provided this result.

So whenever we want to express information where two different features are present, then we can use bar plot of pandas.

axes = df.plot.bar(rot=0, subplots=True)

Example 4: Multiple Bar plot

Here multiple bars are plotted. These can be stacked as well, for that we will use the parameter stacked.

df_bar = pd.DataFrame(np.random.rand(20, 4), columns=['A', 'B', 'C', 'D'])

df_bar.head()

df_bar.plot.bar()

<matplotlib.axes._subplots.AxesSubplot at 0x21086f98e10>

So the same information is conveyed through both these plots, only they are now stacked over one another.

df_bar.plot.bar(stacked=True)

<matplotlib.axes._subplots.AxesSubplot at 0x210870f6080>

To build these bar plots horizontally, there is a slight variation in the function. Let’s look at it. So the bar() is changed to barh().

df_bar.plot.barh(stacked=True)

<matplotlib.axes._subplots.AxesSubplot at 0x21087225048>

Moving onto the next plot type, let’s plot histogram

Pandas Histogram : hist()

Histogram is useful to provide insights on the data distribution. Below we will understand syntax of histogram.

Syntax

dataframe.hist(data, column=None, bins=10, kwargs)

data : Dataframe – This is the dataframe which holds the data.

column : str or sequence – For limiting data to subset of columns

bins : int or sequence, default is 10 – This tells us the number of histogram bins to be used. If an integer is given, bins + 1 bin edges are calculated and returned. If bins is a sequence, gives bin edges, including left edge of first bin and right edge of last bin. These bins helps in building precise histograms.

kwargs – This parameter is used to point towards some extra keyword arguments used in the function. For looking at some other parameters, you can go here

Example 1: Simple pandas histogram plot

Here a dataframe is created by generating random values with the help of numpy.

df_hist = pd.DataFrame({'A': np.random.randn(3000) + 1, 'B': np.random.randn(3000),'C': np.random.randn(3000) - 1}, columns=['A', 'B', 'C'])

df_hist.head()

Using hist() function, histogram was built. Here a parameter called ‘alpha’ is used to bring transparency in the plot. With the increase in the value of alpha, transparency will decrease and vice versa.

df_hist.plot.hist(alpha=0.6)

<matplotlib.axes._subplots.AxesSubplot at 0x21089b67278>

Example 2: Stacked Histogram with bins parameter

df_hist.plot.hist(stacked=True, bins=30,alpha=0.5)

<matplotlib.axes._subplots.AxesSubplot at 0x21089ae0da0>

Now we will learn how can we make a horizontal histogram with the cumulative parameter set to true. The cumulative property ensures there is a continuous histogram and for a horizontal histogram, orientation parameter is used.

df_hist['B'].plot.hist(orientation='horizontal', cumulative=True)

<matplotlib.axes._subplots.AxesSubplot at 0x21089d802e8>

[adrotate banner=”3″]

After looking at bars, we will explore a different type of plot i.e. scatter plot

Pandas Scatter Plot : scatter()

Scatter plot is used to depict the correlation between two variables by plotting them over axes.

Syntax

dataframe.plot.scatter(x, y, s=None, c=None, kwargs)

x : int or str – The column used for horizontal coordinates.

y : int or str – The column used for vertical coordinates.

s : scalar or array_like(optional) – The size of each point.

c : str, int or array_like(optional) – The color of each point.

kwargs : This parameter is used to point towards some extra keyword arguments used in the function.

Example 1: Simple Pandas Scatter plot

Here data and column information is provided to scatter function.

df_scatter = pd.DataFrame(np.random.rand(100, 5), columns=['a', 'b', 'c', 'd','e'])
df_scatter.head()

By assigning two columns i.e. a and b are assigned to respective axes for visualizing scatter plot.

df_scatter.plot.scatter(x='a', y='b');

Example 2: Using color and label parameters

ax = df_scatter.plot.scatter(x='a', y='b', color='DarkRed', label='Length');

df_scatter.plot.scatter(x='c', y='d', color='DarkOrange', label='Width', ax=ax);

Here colormap parameter is used to differentiate the scatter points. On the scale we can see that higher value is contained in points which are of yellow color. This helps in representing multiple points.

ax2 = df_scatter.plot.scatter(x='a',

                      y='b',

                      c='c',

                      colormap='viridis')

The very famous pie chart can be built using pandas. So let’s look at pie chart and learn about its details.

Pandas Pie Chart: pie()

Pie chart is a very useful graph which can be used to represent proportional information.

Syntax

Dataframe.plot.pie(y,kwargs)

y : int or label(optional) – This is the label or position used for plotting the pie plot.

kwargs – Keyword arguments which can be passed to the function.

Example 1: Simple pandas pie chart

In this example, a series is built using pandas. Using this series, we will plot a pie chart which tells us which fruit is consumed the most in India. For assigning the values to each entry, we are using numpy random function.

series = pd.Series(3 * np.random.rand(4),index=['Apple', 'Banana', 'Coconut', 'Watermelon'], name='Fruits_Consumption_in_India')

series

Apple         2.166653
Banana        1.605650
Coconut       2.865410
Watermelon    1.809250
Name: Fruits_Consumption_in_India, dtype: float64

series.plot.pie(figsize=(8, 8))

<matplotlib.axes._subplots.AxesSubplot at 0x21085866828>

Example 2: Values in pie chart

Here with the autopct parameter set as %.2f (float value), we were able to get the values of each item in the pie chart.

series.plot.pie(labels=['Apple', 'Banana', 'Coconut', 'Watermelon'], colors=['r', 'y', 'b', 'g'],autopct='%.2f', fontsize=15, figsize=(7, 7))

<matplotlib.axes._subplots.AxesSubplot at 0x2108b36e4e0>

	height (in metres)	Years taken to grow
lemon	5.0	2.0
cotton	17.5	8.0
neem	40.0	70.0
hibscus	48.0	1.5
peepal	52.0	25.0
banyan	69.0	12.0
coconut	88.0	28.0

	A	B	C	D
0	0.669276	0.338299	0.962047	0.653750
1	0.066810	0.931032	0.898166	0.106958
2	0.173606	0.371832	0.477262	0.633449
3	0.137026	0.693457	0.374763	0.810055
4	0.644451	0.101267	0.733968	0.092187

	A	B	C
0	0.295822	-0.331693	-2.704511
1	1.676880	-2.079090	-1.421252
2	1.080180	-1.345613	-1.374851
3	0.353748	-2.809398	-1.817229
4	1.406088	-1.687410	-2.583514

	a	b	c	d	e
0	0.232764	0.297948	0.203266	0.647538	0.429361
1	0.915141	0.940809	0.710723	0.871370	0.191295
2	0.195358	0.506211	0.812928	0.895210	0.811241
3	0.836417	0.198523	0.508496	0.137529	0.567117
4	0.630026	0.080692	0.884420	0.299120	0.585431

Pandas Visualization Tutorial – Bar Plot, Histogram, Scatter Plot, Pie Chart

Introduction

Importing Pandas Library

Pandas Bar Plot : bar()

Syntax

Example 1: Simple pandas bar plot

Example 2: Visualizing information with two different bar plots in one axes

Example 4: Multiple Bar plot

Pandas Histogram : hist()

Syntax

Example 1: Simple pandas histogram plot

Example 2: Stacked Histogram with bins parameter

Pandas Scatter Plot : scatter()

Syntax

Example 1: Simple Pandas Scatter plot

Example 2: Using color and label parameters

Pandas Pie Chart: pie()

Syntax

Example 1: Simple pandas pie chart

Example 2: Values in pie chart

Conclusion

Leave a Reply Cancel reply

Latest Posts

Follow US