Pandas Tutorial – Index , Reindex and Multiindex

Pandas Tutorial - Index , Reindex and Multi-index

Introduction 

In this article, we are continuing our exploration of indexing operations. These indexing operations are useful in handling data in the form of dataframes. The indexing functions which will be learned in this tutorial are pandas reindex(), index(), and multiindex(). These pandas functions are useful when we have to manage large data, by converting it into dataframes. We would look into the syntax and examples of these functions to understand their usage.

Importing Pandas Library

Starting the tutorial by importing pandas library.

In [1]:
import pandas as pd
import numpy as np

Pandas Reindex : reindex()

Pandas reindex() function helps in conforming the DataFrame to new index with optional filling logic.

Syntax

pandas.DataFrame.index(labels, index, columns, axis, method, copy, level, fill_value, limit, tolerance)

  • labels : array-like,optional – These are the new labels / index to conform the axis specified by ‘axis’ to.
  • index,columns : array-like,optional – These are new labels / index to conform to, should be specified using keywords.
  • axis : int or str – This is the axis over which all the operations are applied.
  • method : {None, ‘backfill’/’bfill’, ‘pad’/’ffill’, ‘nearest’} – This is the parameter used for filling holes in reindexed DataFrame.
  • copy : bool, default True – It will return a new object, even if the passed indexes are the same.
  • level : int or name – This is used for broadcast across a level, matching Index values on the passed MultiIndex level.
  • fill_value : scalar, ddefault np.NaN – This is the value used for filling NaN values.
  • limit : int, default None – This is used for specifying maximum number of consecutive elements to forward or backward fill.
  • tolerance : optional – This is the maximum distance between original and new labels for inexact matches.

Example 1: Simple example of pandas reindex()

In this example, we are using reindex() function of pandas and learning about its usage and syntax.

In [2]:
index = ['Audi', 'BMW', 'Mercedes', 'Jaguar', 'Ferrari']
In [3]:
df = pd.DataFrame({'Speed': [200, 200, 404, 404, 301],
                   'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},
                  index=index)
In [4]:
df
Out[4]:
Speed response_time
Audi 200 0.04
BMW 200 0.02
Mercedes 404 0.07
Jaguar 404 0.08
Ferrari 301 1.00
In [5]:
new_index = ['Audi', 'Mini Cooper', 'Aston Martin', 'Mercedes',
              'BMW']
In [6]:
df.reindex(new_index)
Out[6]:
Speed response_time
Audi 200.0 0.04
Mini Cooper NaN NaN
Aston Martin NaN NaN
Mercedes 404.0 0.07
BMW 200.0 0.02

Example 2: Using fill_value parameter

Here we will look at the usage of fill_value parameter and learn how we can fill the missing values in the dataframe with the help of pandas reindex.

Here the fill_value parameter is provided “0” and thus the NaN values have zero in it.

In [7]:
df.reindex(new_index, fill_value=0)
Out[7]:
Speed response_time
Audi 200 0.04
Mini Cooper 0 0.00
Aston Martin 0 0.00
Mercedes 404 0.07
BMW 200 0.02

As mentioned above, we can fill them with any relevant value. So if required, we can fill them with text as well. This is shown below.

In [8]:
df.reindex(new_index, fill_value='missing')
Out[8]:
Speed response_time
Audi 200 0.04
Mini Cooper missing missing
Aston Martin missing missing
Mercedes 404 0.07
BMW 200 0.02

[adrotate banner=”3″]

Pandas Index : Index()

The pandas index() function is used for converting an immutable ndarray into an ordered, sliceable set.

Example 1: Indexing numerical data

Here the indexing is performed using pandas index() function.

In [9]:
pd.Index([1, 2, 3])
Out[9]:
Int64Index([1, 2, 3], dtype='int64')

Example 2: Indexing using list datatype

Here list datatype is used as a parameter to the index() function

In [10]:
pd.Index(list('abc'))
Out[10]:
Index(['a', 'b', 'c'], dtype='object')

Pandas Multiindex : multiindex()

The pandas multiindex function helps in building a mutli-level indexed object for pandas objects.

Syntax

pandas.MultiIndex.DataFrame(levels,codes,sortorder,names,copy,verify_integrity)

  • levels : sequence of arrays – This contains the unique labels for each level.
  • codes : sequence of arrays – It provides information about Integers for each level designating which label at each location.
  • sortorder : optional Int – This determines the Level of sortedness.
  • names : optional sequence of objects – It has names for each of the index levels.
  • copy : bool, default False – It is used for copying the meta data.
  • verify_integrity : bool, default False – It is used to check that the levels/codes are consistent and valid.

Example 1: Creating multi-index using the pandas multi-index function

Here a multi-index is built using the multi-index function of pandas.

In [11]:
arrays = [[3, 6, 6, 12], ['potato', 'tomato', 'spinach', 'pumpkin']]
In [12]:
pd.MultiIndex.from_arrays(arrays, names=('number', 'vegetables'))
Out[12]:
MultiIndex(levels=[[3, 6, 12], ['potato', 'pumpkin', 'spinach', 'tomato']],
           codes=[[0, 1, 1, 2], [0, 3, 2, 1]],
           names=['number', 'vegetables'])

Example 2: Creating multi-index using tuples

In this example, tuples are used for creating a multi-index.

In [13]:
arrays = [['red', 'red', 'blue', 'blue', 'orange', 'orange', 'green', 'green'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
In [14]:
tuples = list(zip(*arrays))
In [15]:
tuples
Out[15]:
[('red', 'one'),
 ('red', 'two'),
 ('blue', 'one'),
 ('blue', 'two'),
 ('orange', 'one'),
 ('orange', 'two'),
 ('green', 'one'),
 ('green', 'two')]
In [16]:
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
In [17]:
index
Out[17]:
MultiIndex(levels=[['blue', 'green', 'orange', 'red'], ['one', 'two']],
           codes=[[3, 3, 0, 0, 2, 2, 1, 1], [0, 1, 0, 1, 0, 1, 0, 1]],
           names=['first', 'second'])

Conclusion

It’s time to end the article, we learned about pandas indexing functions namely reindex(), index() and multiindex(). We looked at the syntax and examples of these pandas indexing functions. All these functions are helpful and will ease the task of handling of data and managing it through dataframes.

Reference – https://pandas.pydata.org/docs/

Follow Us

Leave a Reply

Your email address will not be published. Required fields are marked *