Introduction
While working with machine learning or data science projects, you will come across situations where you have to transform data or play around with it. Pandas library of Python has many functions to make data analysis and manipulation quite easy. In this tutorial, we will learn about such pandas functions to_frame(), to_list(), astype(), get_dummies() and map(). The functions will be understood with the help of syntax and examples of these functions.
Importing Pandas Library
We are starting the tutorial by importing the Pandas library.
import pandas as pd
import numpy as np
Pandas To_Frame : to_frame()
This function is used for converting the series data to dataframe.
Syntax
pandas.series.to_frame(name)
name : str – This parameter is used for substituting the series name.
Example 1: Simple example of pandas to_frame() function
In this example, we’ll look at a simple example of pandas to_frame function where we’ll able to learn about the conversion process of series data to dataframe.
s = pd.Series([9.5, 1.8, 92.28, 27.18, 19.2002])
s
0 9.5000 1 1.8000 2 92.2800 3 27.1800 4 19.2002 dtype: float64
s.to_frame()
0 | |
---|---|
0 | 9.5000 |
1 | 1.8000 |
2 | 92.2800 |
3 | 27.1800 |
4 | 19.2002 |
Example 2: Converting series datetime data into dataframe
Here a series that consists datetime data is converted to a dataframe.
s = pd.Series(['Boston', 'California', 'Waterloo', 'Barcelona', 'Munich', 'London'])
di = pd.DatetimeIndex(start ='2019-03-01 10:00', freq ='W',
periods = 6, tz = 'Asia/Calcutta')
#setting the index
s.index = di
s
2019-03-03 10:00:00+05:30 Boston 2019-03-10 10:00:00+05:30 California 2019-03-17 10:00:00+05:30 Waterloo 2019-03-24 10:00:00+05:30 Barcelona 2019-03-31 10:00:00+05:30 Munich 2019-04-07 10:00:00+05:30 London Freq: W-SUN, dtype: object
s.to_frame()
0 | |
---|---|
2019-03-03 10:00:00+05:30 | Boston |
2019-03-10 10:00:00+05:30 | California |
2019-03-17 10:00:00+05:30 | Waterloo |
2019-03-24 10:00:00+05:30 | Barcelona |
2019-03-31 10:00:00+05:30 | Munich |
2019-04-07 10:00:00+05:30 | London |
Pandas To_List : to_list()
This function is used for converting the series data to list data type.
Example 1: Simple of pandas to_list function
In this example we are understanding the pandas to_list function that will be used for converting the series data to list data type.
df = pd.read_csv('players.csv')
df.dropna(inplace = True)
Here we have stored the datatype of “salary” column before we convert the datatype.
dtype_before = type(df["Salary"])
Converting the salary column to list data type.
salary_list = df["Salary"].tolist()
Here we have stored the datatype of “salary” column after we have converted the datatype to list.
dtype_after = type(salary_list)
As can be seen, the datatype has changed from series to list.
dtype_before
pandas.core.series.Series
dtype_after
list
salary_list
[7730337.0, 6796117.0, 1148640.0, 1170960.0, 4236287.0, 2525160.0, 525093.0, . . . 1415520.0, 2854940.0, 2637720.0, 4775000.0, 2658240.0, 9463484.0, 12000000.0, 15409570.0, 1348440.0, 981348.0, 2239800.0, 2433333.0, 947276.0]
Pandas Astype : astype()
The pandas astype() function is used for casting a pandas object to a specified dtype dtype.
Syntax
pandas.DataFrame.astype(dtype, copy, errors)
- dtype : data type, or dict of column name -> data type – This is the data type to which the input data is converted.
- copy : bool, default True – This is used for returning a copy if specified as True.
- errors : {‘raise’, ‘ignore’}, default ‘raise’ – This parameter controls raising of exceptions on invalid data for provided dtype.
Example 1: Converting datatypes of columns
Using astype function, we will be able to convert the datatypes of columns in a dataframe.
d = {'col1': [9, 27], 'col2': [18, 45]}
df = pd.DataFrame(data=d)
df.dtypes
col1 int64 col2 int64 dtype: object
In the below instance, the data type for both the columns is changed.
df.astype('int32').dtypes
col1 int32 col2 int32 dtype: object
If desired, using pandas astype can result in alteration of datatype for a specific column as well as shown in the below instance.
df.astype({'col1': 'int32'}).dtypes
col1 int32 col2 int64 dtype: object
Example 2: Using series data with astype function
In this example series data is used to apply the pandas astype() function.
s = pd.Series([6,9,12,18,24,27,30,36,42], dtype='int32')
s
0 6 1 9 2 12 3 18 4 24 5 27 6 30 7 36 8 42 dtype: int32
Here the series data is passed to the pandas astype function and this is how we could change the datatype.
s.astype('int64')
0 6 1 9 2 12 3 18 4 24 5 27 6 30 7 36 8 42 dtype: int64
We can also use the astype function of pandas for converting the datatype to category type.
s.astype('category')
0 6 1 9 2 12 3 18 4 24 5 27 6 30 7 36 8 42 dtype: category Categories (9, int64): [6, 9, 12, 18, ..., 27, 30, 36, 42]
Pandas Get Dummies : get_dummies()
The pandas get_dummies function is beneficial for converting categorical variable to dummy indicator variables.
Syntax
pandas.get_dummies(data, prefix, prefix_sep, dummy_na, columns, sparse, drop_first, dtype)
- data : array-like, Series, or DataFrame – This is the data whose dummy indicators are computed.
- prefix : str, list of str, or dict of str, default None – String to append DataFrame column names.
- prefix : str, default – If appending prefix, separator/delimiter to use.
- dummy_na : bool, default False – This is used for considering null values.
- columns : list-like, default None – Column names in the DataFrame to be encoded.
- sparse : bool, default False – It is used for deciding whether the dummy-encoded columns should be backed by a SparseArray (True) or a regular NumPy array (False).
- drop_first : bool, default False – This decides whether to get k-1 dummies out of k categorical levels by removing the first level.
- dtype : dtype, default np.uint8 – This is used for add data type for new columns. Only a single dtype is allowed.
Example 1: Using series data with pandas get_dummies() function
Here we will use series data for understanding the get_dummies function.
s = pd.Series(list('abbac'))
pd.get_dummies(s)
a | b | c | |
---|---|---|---|
0 | 1 | 0 | 0 |
1 | 0 | 1 | 0 |
2 | 0 | 1 | 0 |
3 | 1 | 0 | 0 |
4 | 0 | 0 | 1 |
s1 = ['a', 'b', np.nan]
pd.get_dummies(s1)
a | b | |
---|---|---|
0 | 1 | 0 |
1 | 0 | 1 |
2 | 0 | 0 |
Example 2: Using multiple columns with get_dummies function
Here in this example, multiple columns are used along with the get_dummies function.
df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'],
'C': [1, 2, 3]})
pd.get_dummies(df, prefix=['col1', 'col2'])
C | col1_a | col1_b | col2_a | col2_b | col2_c | |
---|---|---|---|---|---|---|
0 | 1 | 1 | 0 | 0 | 1 | 0 |
1 | 2 | 0 | 1 | 1 | 0 | 0 |
2 | 3 | 1 | 0 | 0 | 0 | 1 |
Example 3: Using list along with series data
Here we covered the list along with series data where we will understand drop_first and dtype parameter as well.
pd.get_dummies(pd.Series(list('abccba')))
a | b | c | |
---|---|---|---|
0 | 1 | 0 | 0 |
1 | 0 | 1 | 0 |
2 | 0 | 0 | 1 |
3 | 0 | 0 | 1 |
4 | 0 | 1 | 0 |
5 | 1 | 0 | 0 |
Here using the drop_first parameter, we can drop the very first column if required.
pd.get_dummies(pd.Series(list('abccba')), drop_first=True)
b | c | |
---|---|---|
0 | 0 | 0 |
1 | 1 | 0 |
2 | 0 | 1 |
3 | 0 | 1 |
4 | 1 | 0 |
5 | 0 | 0 |
With the help of dtype parameter, we can specify the datatype for the values obtained using get_dummies function.
pd.get_dummies(pd.Series(list('abccba')), dtype=float)
a | b | c | |
---|---|---|---|
0 | 1.0 | 0.0 | 0.0 |
1 | 0.0 | 1.0 | 0.0 |
2 | 0.0 | 0.0 | 1.0 |
3 | 0.0 | 0.0 | 1.0 |
4 | 0.0 | 1.0 | 0.0 |
5 | 1.0 | 0.0 | 0.0 |
Pandas Map : map()
The pandas map function helps in mapping values of series w.r.t. input provided.
Syntax
pandas.series.map(arg,na_action)
- arg : function, collections.abc.Mapping subclass or Series – This parameter is used for providing correspondance.
- na_action : {None, ‘ignore’}, default None – This parameter is used for propagating NaN values, without passing them to the mapping correspondence, if ignore parameter is passed.
Example 1: Using dictionary in arguments of pandas map()
In this example, dictionary data type is used for providing the data.
s = pd.Series(['BMW', 'Mercedes', np.nan, 'Rolls Royce'])
s
0 BMW 1 Mercedes 2 NaN 3 Rolls Royce dtype: object
Map accepts a dict or a Series. Values that are not found in the dict are converted to NaN. Since there was no mapping value for Rolls Royce, we got NaN in the output.
s.map({'BMW': 'Audi', 'Mercedes': 'Mclaren'})
0 Audi 1 Mclaren 2 NaN 3 NaN dtype: object
Example 2: Using function in arguments of pandas map()
In this example, we will understand how we can pass functions as arguments to pandas map(). Using the format keyword, we are able to use the series data values and generate the output.
s.map('I am a {}'.format)
0 I am a BMW 1 I am a Mercedes 2 I am a nan 3 I am a Rolls Royce dtype: object
With the help of na_action parameter, we can ignore the Null values as shown in below example.
s.map('I am a {}'.format, na_action='ignore')
0 I am a BMW 1 I am a Mercedes 2 NaN 3 I am a Rolls Royce dtype: object
Conclusion
We have reached the end of this article, in this article we learned about pandas functions that can help in changing the shape of dataframe. The pandas functions learned are to_frame(), to_list(), astype(), get_dummies() and map(). We have learned about these functions with the help of examples.
- Also Read – Tutorial – Pandas Drop, Pandas Dropna, Pandas Drop Duplicate
- Also Read – Pandas Visualization Tutorial – Bar Plot, Histogram, Scatter Plot, Pie Chart
- Also Read – Tutorial – Pandas Concat, Pandas Append, Pandas Merge, Pandas Join
- Also Read – Pandas DataFrame Tutorial – Selecting Rows by Value, Iterrows and DataReader
Reference – https://pandas.pydata.org/docs/