Pandas Tutorial – Assign(), Transpose() and Pivot()

pandas-assign-pandas-transpose-pandas-pivot
Pandas Tutorial - Assign(), Transpose() and Pivot()

Introduction

When we interact with DataFrames while handling data, there are many situations when we are compelled to look at the dataframes in different ways for extracting more information. In this tutorial, we will learn about more such pandas functions that can help in changing the shapes of dataframe. The pandas functions that we’ll learn in this tutorial are pandas assign(), transpose(), and pivot(). The functions will be explained with the help of syntax and examples for better understanding.

Importing Pandas Library

Initially, we will load the Pandas library.

In [1]:
import pandas as pd
import numpy as np

Pandas Assign : assign()

Pandas assign() function is used to assign new columns to a dataframe.

Syntax

pandas.DataFrame.assign(kwargs)

kwargs – Additional keyword arguments.

Ad
Deep Learning Specialization on Coursera

Example 1: Simple example of pandas assign()

Here we are using pandas assign() function on a dataframe and computing the desired results. In this example, we have seen how we can reach the same result using two different methods.

In [2]:
df = pd.DataFrame({'temp_c': [18.0, 36.0]},
                 index=['Bengaluru', 'Mumbai'])

Here, in this case, we are using keyword lambda for calculating the conversion of temperatures.

In [3]:
df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)
Out[3]:
temp_c temp_f
Bengaluru 18.0 64.4
Mumbai 36.0 96.8

In this example mentioned below, we are calculating the desired result using the direct method.

In [4]:
df.assign(temp_f=df['temp_c'] * 9 / 5 + 32)
Out[4]:
temp_c temp_f
Bengaluru 18.0 64.4
Mumbai 36.0 96.8

Example 2: Creating multiple columns with pandas assign function

In this example, multiple columns are used in the assign() function and the desired results are calculated.

In [5]:
df.assign(temp_f=lambda x: x['temp_c'] * 9 / 5 + 32,
          temp_k=lambda x: (x['temp_f'] +  459.67) * 5 / 9)
Out[5]:
temp_c temp_f temp_k
Bengaluru 18.0 64.4 291.15
Mumbai 36.0 96.8 309.15

Pandas Transpose : transpose()

Pandas transpose() function helps in transposing index and columns.

Syntax

pandas.DataFrame.transpose(args,copy)

args : tuple,optional – This parameter is accepted for compatibility with Numpy.

copy : bool, default False – Using this parameter we decide whether to copy the data after transposing, even for DataFrames with a single dtype.

Example 1: Applying pandas transpose function on square dataframe

In this example, pandas transpose function is applied to a square dataframe.

In [6]:
d1 = {'A': [9, 18], 'B': [27, 36]}
In [7]:
df1 = pd.DataFrame(data=d1)
In [8]:
df1
Out[8]:
A B
0 9 27
1 18 36
In [9]:
df1_transposed = df1.transpose()
In [10]:
df1_transposed
Out[10]:
0 1
A 9 18
B 27 36

Example 2: Applying transpose function on Non-square dataframe

In this example, pandas transpose function is applied to a non-square dataframe.

In [11]:
d2 = {'name': ['Rohit', 'Virat'],
       'score': [7.5, 9.5],
       'employed': [False, True],
       'cars': [1, 2]}
In [12]:
df2 = pd.DataFrame(data=d2)
In [13]:
df2
Out[13]:
name score employed cars
0 Rohit 7.5 False 1
1 Virat 9.5 True 2

For calling pandas transpose() function, we can either call the tranpose() keyword or call it through “T” letter. In the below example, we have done the same.

In [14]:
df2_transposed =  df2.T
In [15]:
df2_transposed
Out[15]:
0 1
name Rohit Virat
score 7.5 9.5
employed False True
cars 1 2

Example 3: Understanding usage of transpose function on dtypes

In this example, we will apply tranpose() function on dtypes and see that when the dtype is homogeneous in the original DataFrame, we get a transposed DataFrame with the same dtype.

In [16]:
df1.dtypes
Out[16]:
A    int64
B    int64
dtype: object
In [17]:
df1_transposed.dtypes
Out[17]:
0    int64
1    int64
dtype: object

When the DataFrame has mixed dtypes, we get a transposed DataFrame with the object dtype.

In [18]:
df2.dtypes
Out[18]:
name         object
score       float64
employed       bool
cars          int64
dtype: object
In [19]:
df2_transposed.dtypes
Out[19]:
0    object
1    object
dtype: object

Pandas Pivot : pivot()

The pandas pivot() function is used to return the reshaped dataframe organized by given index/column values.

Syntax

pandas.DataFrame.pivot(index=None, columns=None, values=None)

index : str or object, optional – This is the column that is used for making new frame’s index.

columns : str or object – This is the column used for making new frame’s columns.

values : str, object or a list of the previous, optional – This are the column(s) used for populating new frame’s values.

Example 1: Simple example of pandas pivot()

In this example, dataframe is built and then pivot function is applied over the dataframe. Here two different instances are discussed of the example.

In [20]:
df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',
                            'two'],
                    'bar': ['P', 'Q', 'R', 'P', 'Q', 'R'],
                    'baz': [1, 2, 3, 4, 5, 6],
                   'zoo': ['x', 'y', 'z', 'q', 'w', 't']})

In this scenario, results are generated with the help of values parameter.

In [21]:
df_piv = df.pivot(index='foo', columns='bar', values='baz')
In [22]:
df_piv
Out[22]:
bar P Q R
foo
one 1 2 3
two 4 5 6

In this scenario, the same result is generated with the help of value passed in an array.

In [23]:
df_piva = df.pivot(index='foo', columns='bar')['baz']
In [24]:
df_piva
Out[24]:
bar P Q R
foo
one 1 2 3
two 4 5 6

Example 2: Understanding value error generation

In this example, we will see that how duplicate data generates error in pivot() function. As we can see, the dataframe without duplicate is displayed in the output.

In [25]:
df_piv
Out[25]:
bar P Q R
foo
one 1 2 3
two 4 5 6

The below mentioned dataframe has duplicates and thus error is generated when we try to display the output of its pivot.

In [26]:
df1 = pd.DataFrame({"foo": ['one', 'one', 'two', 'two'],
                    "bar": ['A', 'A', 'B', 'C'],
                    "baz": [1, 2, 3, 4]})
In [27]:
df1
Out[27]:
foo bar baz
0 one A 1
1 one A 2
2 two B 3
3 two C 4
In [28]:
df1.pivot(index='foo', columns='bar', values='baz')
This will produce an error output as expected- 
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-28-9431535792c7> in <module>
----> 1 df1.pivot(index='foo', columns='bar', values='baz')

H:\Anaconda\lib\site-packages\pandas\core\frame.py in pivot(self, index, columns, values)
   5626     def pivot(self, index=None, columns=None, values=None):
   5627         from pandas.core.reshape.pivot import pivot
-> 5628         return pivot(self, index=index, columns=columns, values=values)
   5629 
   5630     _shared_docs['pivot_table'] = """

H:\Anaconda\lib\site-packages\pandas\core\reshape\pivot.py in pivot(data, index, columns, values)
    386             indexed = data._constructor_sliced(data[values].values,
    387                                                index=index)
--> 388     return indexed.unstack(columns)
    389 
    390 

H:\Anaconda\lib\site-packages\pandas\core\series.py in unstack(self, level, fill_value)
   3299         """
   3300         from pandas.core.reshape.reshape import unstack
-> 3301         return unstack(self, level, fill_value)
   3302 
   3303     # ----------------------------------------------------------------------

H:\Anaconda\lib\site-packages\pandas\core\reshape\reshape.py in unstack(obj, level, fill_value)
    394         unstacker = _Unstacker(obj.values, obj.index, level=level,
    395                                fill_value=fill_value,
--> 396                                constructor=obj._constructor_expanddim)
    397         return unstacker.get_result()
    398 

H:\Anaconda\lib\site-packages\pandas\core\reshape\reshape.py in __init__(self, values, index, level, value_columns, fill_value, constructor)
    126 
    127         self._make_sorted_values_labels()
--> 128         self._make_selectors()
    129 
    130     def _make_sorted_values_labels(self):

H:\Anaconda\lib\site-packages\pandas\core\reshape\reshape.py in _make_selectors(self)
    164 
    165         if mask.sum() < len(self.index):
--> 166             raise ValueError('Index contains duplicate entries, '
    167                              'cannot reshape')
    168 

ValueError: Index contains duplicate entries, cannot reshape

Conclusion

We have reached to the end of this article, in this article we learned about pandas functions that can help in changing the shape of dataframe. The pandas functions learned are assign(), transpose() and pivot(). We have learnt about these functions with the help of examples.

Reference – https://pandas.pydata.org/docs/

Like and Comment section (Community Members)

Create Your ML Profile!

Don't miss out to join exclusive Machine Learning community

Comments

No comments yet