Introduction
When we interact with DataFrames while handling data, there are many situations when we are compelled to look at the dataframes in different ways for extracting more information. In this tutorial, we will learn about more such pandas functions that can help in changing the shapes of dataframe. The pandas functions that we’ll learn in this tutorial are pandas assign(), transpose(), and pivot(). The functions will be explained with the help of syntax and examples for better understanding.
Importing Pandas Library
Initially, we will load the Pandas library.
import pandas as pd
import numpy as np
Pandas Assign : assign()
Pandas assign() function is used to assign new columns to a dataframe.
Syntax
pandas.DataFrame.assign(kwargs)
kwargs – Additional keyword arguments.
Example 1: Simple example of pandas assign()
Here we are using pandas assign() function on a dataframe and computing the desired results. In this example, we have seen how we can reach the same result using two different methods.
df = pd.DataFrame({'temp_c': [18.0, 36.0]},
index=['Bengaluru', 'Mumbai'])
Here, in this case, we are using keyword lambda for calculating the conversion of temperatures.
df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)
temp_c | temp_f | |
---|---|---|
Bengaluru | 18.0 | 64.4 |
Mumbai | 36.0 | 96.8 |
In this example mentioned below, we are calculating the desired result using the direct method.
df.assign(temp_f=df['temp_c'] * 9 / 5 + 32)
temp_c | temp_f | |
---|---|---|
Bengaluru | 18.0 | 64.4 |
Mumbai | 36.0 | 96.8 |
Example 2: Creating multiple columns with pandas assign function
In this example, multiple columns are used in the assign() function and the desired results are calculated.
df.assign(temp_f=lambda x: x['temp_c'] * 9 / 5 + 32,
temp_k=lambda x: (x['temp_f'] + 459.67) * 5 / 9)
temp_c | temp_f | temp_k | |
---|---|---|---|
Bengaluru | 18.0 | 64.4 | 291.15 |
Mumbai | 36.0 | 96.8 | 309.15 |
[adrotate banner=”3″]
Pandas Transpose : transpose()
Pandas transpose() function helps in transposing index and columns.
Syntax
pandas.DataFrame.transpose(args,copy)
args : tuple,optional – This parameter is accepted for compatibility with Numpy.
copy : bool, default False – Using this parameter we decide whether to copy the data after transposing, even for DataFrames with a single dtype.
Example 1: Applying pandas transpose function on square dataframe
In this example, pandas transpose function is applied to a square dataframe.
d1 = {'A': [9, 18], 'B': [27, 36]}
df1 = pd.DataFrame(data=d1)
df1
A | B | |
---|---|---|
0 | 9 | 27 |
1 | 18 | 36 |
df1_transposed = df1.transpose()
df1_transposed
0 | 1 | |
---|---|---|
A | 9 | 18 |
B | 27 | 36 |
Example 2: Applying transpose function on Non-square dataframe
In this example, pandas transpose function is applied to a non-square dataframe.
d2 = {'name': ['Rohit', 'Virat'],
'score': [7.5, 9.5],
'employed': [False, True],
'cars': [1, 2]}
df2 = pd.DataFrame(data=d2)
df2
name | score | employed | cars | |
---|---|---|---|---|
0 | Rohit | 7.5 | False | 1 |
1 | Virat | 9.5 | True | 2 |
For calling pandas transpose() function, we can either call the tranpose() keyword or call it through “T” letter. In the below example, we have done the same.
df2_transposed = df2.T
df2_transposed
0 | 1 | |
---|---|---|
name | Rohit | Virat |
score | 7.5 | 9.5 |
employed | False | True |
cars | 1 | 2 |
Example 3: Understanding usage of transpose function on dtypes
In this example, we will apply tranpose() function on dtypes and see that when the dtype is homogeneous in the original DataFrame, we get a transposed DataFrame with the same dtype.
df1.dtypes
A int64 B int64 dtype: object
df1_transposed.dtypes
0 int64 1 int64 dtype: object
When the DataFrame has mixed dtypes, we get a transposed DataFrame with the object dtype.
df2.dtypes
name object score float64 employed bool cars int64 dtype: object
df2_transposed.dtypes
0 object 1 object dtype: object
Pandas Pivot : pivot()
The pandas pivot() function is used to return the reshaped dataframe organized by given index/column values.
Syntax
pandas.DataFrame.pivot(index=None, columns=None, values=None)
index : str or object, optional – This is the column that is used for making new frame’s index.
columns : str or object – This is the column used for making new frame’s columns.
values : str, object or a list of the previous, optional – This are the column(s) used for populating new frame’s values.
Example 1: Simple example of pandas pivot()
In this example, dataframe is built and then pivot function is applied over the dataframe. Here two different instances are discussed of the example.
df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',
'two'],
'bar': ['P', 'Q', 'R', 'P', 'Q', 'R'],
'baz': [1, 2, 3, 4, 5, 6],
'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
In this scenario, results are generated with the help of values parameter.
df_piv = df.pivot(index='foo', columns='bar', values='baz')
df_piv
bar | P | Q | R |
---|---|---|---|
foo | |||
one | 1 | 2 | 3 |
two | 4 | 5 | 6 |
In this scenario, the same result is generated with the help of value passed in an array.
df_piva = df.pivot(index='foo', columns='bar')['baz']
df_piva
bar | P | Q | R |
---|---|---|---|
foo | |||
one | 1 | 2 | 3 |
two | 4 | 5 | 6 |
Example 2: Understanding value error generation
In this example, we will see that how duplicate data generates error in pivot() function. As we can see, the dataframe without duplicate is displayed in the output.
df_piv
bar | P | Q | R |
---|---|---|---|
foo | |||
one | 1 | 2 | 3 |
two | 4 | 5 | 6 |
The below mentioned dataframe has duplicates and thus error is generated when we try to display the output of its pivot.
df1 = pd.DataFrame({"foo": ['one', 'one', 'two', 'two'],
"bar": ['A', 'A', 'B', 'C'],
"baz": [1, 2, 3, 4]})
df1
foo | bar | baz | |
---|---|---|---|
0 | one | A | 1 |
1 | one | A | 2 |
2 | two | B | 3 |
3 | two | C | 4 |
df1.pivot(index='foo', columns='bar', values='baz')
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-28-9431535792c7> in <module> ----> 1 df1.pivot(index='foo', columns='bar', values='baz') H:\Anaconda\lib\site-packages\pandas\core\frame.py in pivot(self, index, columns, values) 5626 def pivot(self, index=None, columns=None, values=None): 5627 from pandas.core.reshape.pivot import pivot -> 5628 return pivot(self, index=index, columns=columns, values=values) 5629 5630 _shared_docs['pivot_table'] = """ H:\Anaconda\lib\site-packages\pandas\core\reshape\pivot.py in pivot(data, index, columns, values) 386 indexed = data._constructor_sliced(data[values].values, 387 index=index) --> 388 return indexed.unstack(columns) 389 390 H:\Anaconda\lib\site-packages\pandas\core\series.py in unstack(self, level, fill_value) 3299 """ 3300 from pandas.core.reshape.reshape import unstack -> 3301 return unstack(self, level, fill_value) 3302 3303 # ---------------------------------------------------------------------- H:\Anaconda\lib\site-packages\pandas\core\reshape\reshape.py in unstack(obj, level, fill_value) 394 unstacker = _Unstacker(obj.values, obj.index, level=level, 395 fill_value=fill_value, --> 396 constructor=obj._constructor_expanddim) 397 return unstacker.get_result() 398 H:\Anaconda\lib\site-packages\pandas\core\reshape\reshape.py in __init__(self, values, index, level, value_columns, fill_value, constructor) 126 127 self._make_sorted_values_labels() --> 128 self._make_selectors() 129 130 def _make_sorted_values_labels(self): H:\Anaconda\lib\site-packages\pandas\core\reshape\reshape.py in _make_selectors(self) 164 165 if mask.sum() < len(self.index): --> 166 raise ValueError('Index contains duplicate entries, ' 167 'cannot reshape') 168 ValueError: Index contains duplicate entries, cannot reshape
Conclusion
We have reached to the end of this article, in this article we learned about pandas functions that can help in changing the shape of dataframe. The pandas functions learned are assign(), transpose() and pivot(). We have learnt about these functions with the help of examples.
- Also Read – Tutorial – Pandas Drop, Pandas Dropna, Pandas Drop Duplicate
- Also Read – Pandas Visualization Tutorial – Bar Plot, Histogram, Scatter Plot, Pie Chart
- Also Read – Tutorial – Pandas Concat, Pandas Append, Pandas Merge, Pandas Join
- Also Read – Pandas DataFrame Tutorial – Selecting Rows by Value, Iterrows and DataReader
Reference – https://pandas.pydata.org/docs/