Introduction
During the data exploratory exercise in your machine learning or data science project, it is always useful to understand data with the help of visualizations. Python Pandas library offers basic support for various types of visualizations. In this article, we will explore the following pandas visualization functions – bar plot, histogram, box plot, scatter plot, and pie chart. We will learn its syntax of each visualization and see its multiple variations.
Importing Pandas Library
To begin this article, first import pandas library with an alias as pd.
import pandas as pd
import numpy as np
We will start this tutorial by plotting the bar graph.
Pandas Bar Plot : bar()
Bar Plot is used to represent categorical data in the form of vertical and horizontal bars, where the lengths of these bars are proportional to the values they contain.
Syntax
dataframe.plot.bar(x=None, y=None, kwargs)
x : label or position(optional) – This helps in plotting of one column against another column.
y : label or position(optional) – This helps in plotting of one column against another column.
kwargs – This parameter is used to point towards some extra keyword arguments used in the function.
output – The final output is either in the form of a plot visualized with the help of matplotlib or otherwise, we may get numpy array as output.
Example 1: Simple pandas bar plot
Now let’s look at examples of bar plot.
Here a dataframe df is created in which two different values are stored, it is then visualized using bar function.
df = pd.DataFrame({'label':['P', 'Q', 'R'], 'values':[70, 25, 97]})
df
label | values | |
---|---|---|
0 | P | 70 |
1 | Q | 25 |
2 | R | 97 |
Here x-axis is provided with labels and y-axis with values. The rot or rotation parameter is used for rotating the x-axis labels to some degrees.
ax = df.plot.bar(x='label', y='values', rot=0)
ax
<matplotlib.axes._subplots.AxesSubplot at 0x21085a1fb38>
Example 2: Visualizing information with two different bar plots in one axes
In [4]:
height = [5, 17.5, 40, 48, 52, 69, 88]
years_to_fully_grow = [2, 8, 70, 1.5, 25, 12, 28]
index = ['lemon', 'cotton', 'neem',
'hibscus', 'peepal', 'banyan', 'coconut']
df = pd.DataFrame({'height (in metres)': height,
'Years taken to grow': years_to_fully_grow}, index=index)
df
height (in metres) | Years taken to grow | |
---|---|---|
lemon | 5.0 | 2.0 |
cotton | 17.5 | 8.0 |
neem | 40.0 | 70.0 |
hibscus | 48.0 | 1.5 |
peepal | 52.0 | 25.0 |
banyan | 69.0 | 12.0 |
coconut | 88.0 | 28.0 |
Here in this plot, we can see there are two bar plots along with legend at the top left corner of the plot, providing information about axes.
ax = df.plot.bar(rot=0)
The plot shown above can be divided into two different bar plots, conveying the same information. Let us see how it can be achieved. Here we can see that by assigning subplots a value as true has provided this result.
So whenever we want to express information where two different features are present, then we can use bar plot of pandas.
axes = df.plot.bar(rot=0, subplots=True)
Example 4: Multiple Bar plot
Here multiple bars are plotted. These can be stacked as well, for that we will use the parameter stacked.
df_bar = pd.DataFrame(np.random.rand(20, 4), columns=['A', 'B', 'C', 'D'])
df_bar.head()
A | B | C | D | |
---|---|---|---|---|
0 | 0.669276 | 0.338299 | 0.962047 | 0.653750 |
1 | 0.066810 | 0.931032 | 0.898166 | 0.106958 |
2 | 0.173606 | 0.371832 | 0.477262 | 0.633449 |
3 | 0.137026 | 0.693457 | 0.374763 | 0.810055 |
4 | 0.644451 | 0.101267 | 0.733968 | 0.092187 |
df_bar.plot.bar()
<matplotlib.axes._subplots.AxesSubplot at 0x21086f98e10>
So the same information is conveyed through both these plots, only they are now stacked over one another.
df_bar.plot.bar(stacked=True)
<matplotlib.axes._subplots.AxesSubplot at 0x210870f6080>
To build these bar plots horizontally, there is a slight variation in the function. Let’s look at it. So the bar() is changed to barh().
df_bar.plot.barh(stacked=True)
<matplotlib.axes._subplots.AxesSubplot at 0x21087225048>
Moving onto the next plot type, let’s plot histogram
Pandas Histogram : hist()
Histogram is useful to provide insights on the data distribution. Below we will understand syntax of histogram.
Syntax
dataframe.hist(data, column=None, bins=10, kwargs)
data : Dataframe – This is the dataframe which holds the data.
column : str or sequence – For limiting data to subset of columns
bins : int or sequence, default is 10 – This tells us the number of histogram bins to be used. If an integer is given, bins + 1 bin edges are calculated and returned. If bins is a sequence, gives bin edges, including left edge of first bin and right edge of last bin. These bins helps in building precise histograms.
kwargs – This parameter is used to point towards some extra keyword arguments used in the function. For looking at some other parameters, you can go here
Example 1: Simple pandas histogram plot
Here a dataframe is created by generating random values with the help of numpy.
df_hist = pd.DataFrame({'A': np.random.randn(3000) + 1, 'B': np.random.randn(3000),'C': np.random.randn(3000) - 1}, columns=['A', 'B', 'C'])
df_hist.head()
A | B | C | |
---|---|---|---|
0 | 0.295822 | -0.331693 | -2.704511 |
1 | 1.676880 | -2.079090 | -1.421252 |
2 | 1.080180 | -1.345613 | -1.374851 |
3 | 0.353748 | -2.809398 | -1.817229 |
4 | 1.406088 | -1.687410 | -2.583514 |
Using hist() function, histogram was built. Here a parameter called ‘alpha’ is used to bring transparency in the plot. With the increase in the value of alpha, transparency will decrease and vice versa.
df_hist.plot.hist(alpha=0.6)
<matplotlib.axes._subplots.AxesSubplot at 0x21089b67278>
Example 2: Stacked Histogram with bins parameter
df_hist.plot.hist(stacked=True, bins=30,alpha=0.5)
<matplotlib.axes._subplots.AxesSubplot at 0x21089ae0da0>
Now we will learn how can we make a horizontal histogram with the cumulative parameter set to true. The cumulative property ensures there is a continuous histogram and for a horizontal histogram, orientation parameter is used.
df_hist['B'].plot.hist(orientation='horizontal', cumulative=True)
<matplotlib.axes._subplots.AxesSubplot at 0x21089d802e8>
[adrotate banner=”3″]
After looking at bars, we will explore a different type of plot i.e. scatter plot
Pandas Scatter Plot : scatter()
Scatter plot is used to depict the correlation between two variables by plotting them over axes.
Syntax
dataframe.plot.scatter(x, y, s=None, c=None, kwargs)
x : int or str – The column used for horizontal coordinates.
y : int or str – The column used for vertical coordinates.
s : scalar or array_like(optional) – The size of each point.
c : str, int or array_like(optional) – The color of each point.
kwargs : This parameter is used to point towards some extra keyword arguments used in the function.
Example 1: Simple Pandas Scatter plot
Here data and column information is provided to scatter function.
df_scatter = pd.DataFrame(np.random.rand(100, 5), columns=['a', 'b', 'c', 'd','e'])
df_scatter.head()
a | b | c | d | e | |
---|---|---|---|---|---|
0 | 0.232764 | 0.297948 | 0.203266 | 0.647538 | 0.429361 |
1 | 0.915141 | 0.940809 | 0.710723 | 0.871370 | 0.191295 |
2 | 0.195358 | 0.506211 | 0.812928 | 0.895210 | 0.811241 |
3 | 0.836417 | 0.198523 | 0.508496 | 0.137529 | 0.567117 |
4 | 0.630026 | 0.080692 | 0.884420 | 0.299120 | 0.585431 |
By assigning two columns i.e. a and b are assigned to respective axes for visualizing scatter plot.
df_scatter.plot.scatter(x='a', y='b');
Example 2: Using color and label parameters
ax = df_scatter.plot.scatter(x='a', y='b', color='DarkRed', label='Length');
df_scatter.plot.scatter(x='c', y='d', color='DarkOrange', label='Width', ax=ax);
Here colormap parameter is used to differentiate the scatter points. On the scale we can see that higher value is contained in points which are of yellow color. This helps in representing multiple points.
ax2 = df_scatter.plot.scatter(x='a',
y='b',
c='c',
colormap='viridis')
The very famous pie chart can be built using pandas. So let’s look at pie chart and learn about its details.
Pandas Pie Chart: pie()
Pie chart is a very useful graph which can be used to represent proportional information.
Syntax
Dataframe.plot.pie(y,kwargs)
y : int or label(optional) – This is the label or position used for plotting the pie plot.
kwargs – Keyword arguments which can be passed to the function.
Example 1: Simple pandas pie chart
In this example, a series is built using pandas. Using this series, we will plot a pie chart which tells us which fruit is consumed the most in India. For assigning the values to each entry, we are using numpy random function.
series = pd.Series(3 * np.random.rand(4),index=['Apple', 'Banana', 'Coconut', 'Watermelon'], name='Fruits_Consumption_in_India')
series
Apple 2.166653 Banana 1.605650 Coconut 2.865410 Watermelon 1.809250 Name: Fruits_Consumption_in_India, dtype: float64
series.plot.pie(figsize=(8, 8))
<matplotlib.axes._subplots.AxesSubplot at 0x21085866828>
Example 2: Values in pie chart
Here with the autopct parameter set as %.2f (float value), we were able to get the values of each item in the pie chart.
series.plot.pie(labels=['Apple', 'Banana', 'Coconut', 'Watermelon'], colors=['r', 'y', 'b', 'g'],autopct='%.2f', fontsize=15, figsize=(7, 7))
<matplotlib.axes._subplots.AxesSubplot at 0x2108b36e4e0>
Conclusion
Reaching the end of this tutorial, we learned how we can build various kinds of plots like bar plot, histogram, scatter plot and pie chart using in-built functions of pandas visualization libraries. With the help of syntax and examples, we got deeper understanding of these interactive plots. Along with this, we looked at different areas where these plots are useful for conveying information.
Reference – https://pandas.pydata.org/docs/
-
I am Palash Sharma, an undergraduate student who loves to explore and garner in-depth knowledge in the fields like Artificial Intelligence and Machine Learning. I am captivated by the wonders these fields have produced with their novel implementations. With this, I have a desire to share my knowledge with others in all my capacity.
View all posts