Introduction
In this article, we are continuing our exploration of indexing operations. These indexing operations are useful in handling data in the form of dataframes. The indexing functions which will be learned in this tutorial are pandas reindex(), index(), and multiindex(). These pandas functions are useful when we have to manage large data, by converting it into dataframes. We would look into the syntax and examples of these functions to understand their usage.
Importing Pandas Library
Starting the tutorial by importing pandas library.
import pandas as pd
import numpy as np
Pandas Reindex : reindex()
Pandas reindex() function helps in conforming the DataFrame to new index with optional filling logic.
Syntax
pandas.DataFrame.index(labels, index, columns, axis, method, copy, level, fill_value, limit, tolerance)
- labels : array-like,optional – These are the new labels / index to conform the axis specified by ‘axis’ to.
- index,columns : array-like,optional – These are new labels / index to conform to, should be specified using keywords.
- axis : int or str – This is the axis over which all the operations are applied.
- method : {None, ‘backfill’/’bfill’, ‘pad’/’ffill’, ‘nearest’} – This is the parameter used for filling holes in reindexed DataFrame.
- copy : bool, default True – It will return a new object, even if the passed indexes are the same.
- level : int or name – This is used for broadcast across a level, matching Index values on the passed MultiIndex level.
- fill_value : scalar, ddefault np.NaN – This is the value used for filling NaN values.
- limit : int, default None – This is used for specifying maximum number of consecutive elements to forward or backward fill.
- tolerance : optional – This is the maximum distance between original and new labels for inexact matches.
Example 1: Simple example of pandas reindex()
In this example, we are using reindex() function of pandas and learning about its usage and syntax.
index = ['Audi', 'BMW', 'Mercedes', 'Jaguar', 'Ferrari']
df = pd.DataFrame({'Speed': [200, 200, 404, 404, 301],
'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},
index=index)
df
Speed | response_time | |
---|---|---|
Audi | 200 | 0.04 |
BMW | 200 | 0.02 |
Mercedes | 404 | 0.07 |
Jaguar | 404 | 0.08 |
Ferrari | 301 | 1.00 |
new_index = ['Audi', 'Mini Cooper', 'Aston Martin', 'Mercedes',
'BMW']
df.reindex(new_index)
Speed | response_time | |
---|---|---|
Audi | 200.0 | 0.04 |
Mini Cooper | NaN | NaN |
Aston Martin | NaN | NaN |
Mercedes | 404.0 | 0.07 |
BMW | 200.0 | 0.02 |
Example 2: Using fill_value parameter
Here we will look at the usage of fill_value parameter and learn how we can fill the missing values in the dataframe with the help of pandas reindex.
Here the fill_value parameter is provided “0” and thus the NaN values have zero in it.
df.reindex(new_index, fill_value=0)
Speed | response_time | |
---|---|---|
Audi | 200 | 0.04 |
Mini Cooper | 0 | 0.00 |
Aston Martin | 0 | 0.00 |
Mercedes | 404 | 0.07 |
BMW | 200 | 0.02 |
As mentioned above, we can fill them with any relevant value. So if required, we can fill them with text as well. This is shown below.
df.reindex(new_index, fill_value='missing')
Speed | response_time | |
---|---|---|
Audi | 200 | 0.04 |
Mini Cooper | missing | missing |
Aston Martin | missing | missing |
Mercedes | 404 | 0.07 |
BMW | 200 | 0.02 |
[adrotate banner=”3″]
Pandas Index : Index()
The pandas index() function is used for converting an immutable ndarray into an ordered, sliceable set.
Example 1: Indexing numerical data
Here the indexing is performed using pandas index() function.
pd.Index([1, 2, 3])
Int64Index([1, 2, 3], dtype='int64')
Example 2: Indexing using list datatype
Here list datatype is used as a parameter to the index() function
pd.Index(list('abc'))
Index(['a', 'b', 'c'], dtype='object')
Pandas Multiindex : multiindex()
The pandas multiindex function helps in building a mutli-level indexed object for pandas objects.
Syntax
pandas.MultiIndex.DataFrame(levels,codes,sortorder,names,copy,verify_integrity)
- levels : sequence of arrays – This contains the unique labels for each level.
- codes : sequence of arrays – It provides information about Integers for each level designating which label at each location.
- sortorder : optional Int – This determines the Level of sortedness.
- names : optional sequence of objects – It has names for each of the index levels.
- copy : bool, default False – It is used for copying the meta data.
- verify_integrity : bool, default False – It is used to check that the levels/codes are consistent and valid.
Example 1: Creating multi-index using the pandas multi-index function
Here a multi-index is built using the multi-index function of pandas.
arrays = [[3, 6, 6, 12], ['potato', 'tomato', 'spinach', 'pumpkin']]
pd.MultiIndex.from_arrays(arrays, names=('number', 'vegetables'))
MultiIndex(levels=[[3, 6, 12], ['potato', 'pumpkin', 'spinach', 'tomato']], codes=[[0, 1, 1, 2], [0, 3, 2, 1]], names=['number', 'vegetables'])
Example 2: Creating multi-index using tuples
In this example, tuples are used for creating a multi-index.
arrays = [['red', 'red', 'blue', 'blue', 'orange', 'orange', 'green', 'green'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
tuples
[('red', 'one'), ('red', 'two'), ('blue', 'one'), ('blue', 'two'), ('orange', 'one'), ('orange', 'two'), ('green', 'one'), ('green', 'two')]
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
index
MultiIndex(levels=[['blue', 'green', 'orange', 'red'], ['one', 'two']], codes=[[3, 3, 0, 0, 2, 2, 1, 1], [0, 1, 0, 1, 0, 1, 0, 1]], names=['first', 'second'])
Conclusion
It’s time to end the article, we learned about pandas indexing functions namely reindex(), index() and multiindex(). We looked at the syntax and examples of these pandas indexing functions. All these functions are helpful and will ease the task of handling of data and managing it through dataframes.
- Also Read – Tutorial – Pandas Drop, Pandas Dropna, Pandas Drop Duplicate
- Also Read – Pandas Visualization Tutorial – Bar Plot, Histogram, Scatter Plot, Pie Chart
- Also Read – Tutorial – Pandas Concat, Pandas Append, Pandas Merge, Pandas Join
- Also Read – Pandas DataFrame Tutorial – Selecting Rows by Value, Iterrows and DataReader
Reference – https://pandas.pydata.org/docs/