Pandas Tutorial – to_frame(), to_list(), astype(), get_dummies() and map()

Pandas Tutorial - to_frame(), to_list(), astype(), get_dummies() and map()

Introduction

While working with machine learning or data science projects, you will come across situations where you have to transform data or play around with it. Pandas library of Python has many functions to make data analysis and manipulation quite easy. In this tutorial, we will learn about such pandas functions to_frame(), to_list(), astype(), get_dummies() and map(). The functions will be understood with the help of syntax and examples of these functions.

Importing Pandas Library

We are starting the tutorial by importing the Pandas library.

In [1]:
import pandas as pd
import numpy as np

Pandas To_Frame : to_frame()

This function is used for converting the series data to dataframe.

Syntax

pandas.series.to_frame(name)

name : str – This parameter is used for substituting the series name.

Example 1: Simple example of pandas to_frame() function

In this example, we’ll look at a simple example of pandas to_frame function where we’ll able to learn about the conversion process of series data to dataframe.

In [2]:
s = pd.Series([9.5, 1.8, 92.28, 27.18, 19.2002]) 
In [3]:
s
Out[3]:
0     9.5000
1     1.8000
2    92.2800
3    27.1800
4    19.2002
dtype: float64
In [4]:
s.to_frame()
Out[4]:
0
0 9.5000
1 1.8000
2 92.2800
3 27.1800
4 19.2002

Example 2: Converting series datetime data into dataframe

Here a series that consists datetime data is converted to a dataframe.

In [5]:
s = pd.Series(['Boston', 'California', 'Waterloo', 'Barcelona', 'Munich', 'London']) 
In [6]:
di = pd.DatetimeIndex(start ='2019-03-01 10:00', freq ='W',  
                     periods = 6, tz = 'Asia/Calcutta')
In [7]:
#setting the index
s.index = di
In [8]:
s
Out[8]:
2019-03-03 10:00:00+05:30        Boston
2019-03-10 10:00:00+05:30    California
2019-03-17 10:00:00+05:30      Waterloo
2019-03-24 10:00:00+05:30     Barcelona
2019-03-31 10:00:00+05:30        Munich
2019-04-07 10:00:00+05:30        London
Freq: W-SUN, dtype: object
In [9]:
s.to_frame()
Out[9]:
0
2019-03-03 10:00:00+05:30 Boston
2019-03-10 10:00:00+05:30 California
2019-03-17 10:00:00+05:30 Waterloo
2019-03-24 10:00:00+05:30 Barcelona
2019-03-31 10:00:00+05:30 Munich
2019-04-07 10:00:00+05:30 London

Pandas To_List : to_list()

This function is used for converting the series data to list data type.

Example 1: Simple of pandas to_list function

In this example we are understanding the pandas to_list function that will be used for converting the series data to list data type.

In [10]:
df = pd.read_csv('players.csv')
In [11]:
df.dropna(inplace = True)  

Here we have stored the datatype of “salary” column before we convert the datatype.

In [12]:
dtype_before = type(df["Salary"]) 

Converting the salary column to list data type.

In [13]:
salary_list = df["Salary"].tolist() 

Here we have stored the datatype of “salary” column after we have converted the datatype to list.

In [14]:
dtype_after = type(salary_list) 

As can be seen, the datatype has changed from series to list.

In [15]:
dtype_before
Out[15]:
pandas.core.series.Series
In [16]:
dtype_after
Out[16]:
list
In [17]:
salary_list
Out[17]:
[7730337.0,
 6796117.0,
 1148640.0,
 1170960.0,
 4236287.0,
 2525160.0,
 525093.0,
.
.
.
 1415520.0,
 2854940.0,
 2637720.0,
 4775000.0,
 2658240.0,
 9463484.0,
 12000000.0,
 15409570.0,
 1348440.0,
 981348.0,
 2239800.0,
 2433333.0,
 947276.0]
[adrotate banner=”3″]

Pandas Astype : astype()

The pandas astype() function is used for casting a pandas object to a specified dtype dtype.

Syntax

pandas.DataFrame.astype(dtype, copy, errors)

  • dtype : data type, or dict of column name -> data type – This is the data type to which the input data is converted.
  • copy : bool, default True – This is used for returning a copy if specified as True.
  • errors : {‘raise’, ‘ignore’}, default ‘raise’ – This parameter controls raising of exceptions on invalid data for provided dtype.

Example 1: Converting datatypes of columns

Using astype function, we will be able to convert the datatypes of columns in a dataframe.

In [18]:
d = {'col1': [9, 27], 'col2': [18, 45]}
In [19]:
df = pd.DataFrame(data=d)
In [20]:
df.dtypes
Out[20]:
col1    int64
col2    int64
dtype: object

In the below instance, the data type for both the columns is changed.

In [21]:
df.astype('int32').dtypes
Out[21]:
col1    int32
col2    int32
dtype: object

If desired, using pandas astype can result in alteration of datatype for a specific column as well as shown in the below instance.

In [22]:
df.astype({'col1': 'int32'}).dtypes
Out[22]:
col1    int32
col2    int64
dtype: object

Example 2: Using series data with astype function

In this example series data is used to apply the pandas astype() function.

In [23]:
s = pd.Series([6,9,12,18,24,27,30,36,42], dtype='int32')
In [24]:
s
Out[24]:
0     6
1     9
2    12
3    18
4    24
5    27
6    30
7    36
8    42
dtype: int32

Here the series data is passed to the pandas astype function and this is how we could change the datatype.

In [25]:
s.astype('int64')
Out[25]:
0     6
1     9
2    12
3    18
4    24
5    27
6    30
7    36
8    42
dtype: int64

We can also use the astype function of pandas for converting the datatype to category type.

In [26]:
s.astype('category')
Out[26]:
0     6
1     9
2    12
3    18
4    24
5    27
6    30
7    36
8    42
dtype: category
Categories (9, int64): [6, 9, 12, 18, ..., 27, 30, 36, 42]

Pandas Get Dummies : get_dummies()

The pandas get_dummies function is beneficial for converting categorical variable to dummy indicator variables.

Syntax

pandas.get_dummies(data, prefix, prefix_sep, dummy_na, columns, sparse, drop_first, dtype)

  • data : array-like, Series, or DataFrame – This is the data whose dummy indicators are computed.
  • prefix : str, list of str, or dict of str, default None – String to append DataFrame column names.
  • prefix : str, default – If appending prefix, separator/delimiter to use.
  • dummy_na : bool, default False – This is used for considering null values.
  • columns : list-like, default None – Column names in the DataFrame to be encoded.
  • sparse : bool, default False – It is used for deciding whether the dummy-encoded columns should be backed by a SparseArray (True) or a regular NumPy array (False).
  • drop_first : bool, default False – This decides whether to get k-1 dummies out of k categorical levels by removing the first level.
  • dtype : dtype, default np.uint8 – This is used for add data type for new columns. Only a single dtype is allowed.

Example 1: Using series data with pandas get_dummies() function

Here we will use series data for understanding the get_dummies function.

In [27]:
s = pd.Series(list('abbac'))
In [28]:
pd.get_dummies(s)
Out[28]:
a b c
0 1 0 0
1 0 1 0
2 0 1 0
3 1 0 0
4 0 0 1
In [29]:
s1 = ['a', 'b', np.nan]
In [30]:
pd.get_dummies(s1)
Out[30]:
a b
0 1 0
1 0 1
2 0 0

Example 2: Using multiple columns with get_dummies function

Here in this example, multiple columns are used along with the get_dummies function.

In [31]:
df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'],
                    'C': [1, 2, 3]})
In [32]:
pd.get_dummies(df, prefix=['col1', 'col2'])
Out[32]:
C col1_a col1_b col2_a col2_b col2_c
0 1 1 0 0 1 0
1 2 0 1 1 0 0
2 3 1 0 0 0 1

Example 3: Using list along with series data

Here we covered the list along with series data where we will understand drop_first and dtype parameter as well.

In [33]:
pd.get_dummies(pd.Series(list('abccba')))
Out[33]:
a b c
0 1 0 0
1 0 1 0
2 0 0 1
3 0 0 1
4 0 1 0
5 1 0 0

Here using the drop_first parameter, we can drop the very first column if required.

In [34]:
pd.get_dummies(pd.Series(list('abccba')), drop_first=True)
Out[34]:
b c
0 0 0
1 1 0
2 0 1
3 0 1
4 1 0
5 0 0

With the help of dtype parameter, we can specify the datatype for the values obtained using get_dummies function.

In [35]:
pd.get_dummies(pd.Series(list('abccba')), dtype=float)
Out[35]:
a b c
0 1.0 0.0 0.0
1 0.0 1.0 0.0
2 0.0 0.0 1.0
3 0.0 0.0 1.0
4 0.0 1.0 0.0
5 1.0 0.0 0.0

Pandas Map : map()

The pandas map function helps in mapping values of series w.r.t. input provided.

Syntax

pandas.series.map(arg,na_action)

  • arg : function, collections.abc.Mapping subclass or Series – This parameter is used for providing correspondance.
  • na_action : {None, ‘ignore’}, default None – This parameter is used for propagating NaN values, without passing them to the mapping correspondence, if ignore parameter is passed.

Example 1: Using dictionary in arguments of pandas map()

In this example, dictionary data type is used for providing the data.

In [36]:
s = pd.Series(['BMW', 'Mercedes', np.nan, 'Rolls Royce'])
In [37]:
s
Out[37]:
0            BMW
1       Mercedes
2            NaN
3    Rolls Royce
dtype: object

Map accepts a dict or a Series. Values that are not found in the dict are converted to NaN. Since there was no mapping value for Rolls Royce, we got NaN in the output.

In [38]:
s.map({'BMW': 'Audi', 'Mercedes': 'Mclaren'})
Out[38]:
0       Audi
1    Mclaren
2        NaN
3        NaN
dtype: object

Example 2: Using function in arguments of pandas map()

In this example, we will understand how we can pass functions as arguments to pandas map(). Using the format keyword, we are able to use the series data values and generate the output.

In [39]:
s.map('I am a {}'.format)
Out[39]:
0            I am a BMW
1       I am a Mercedes
2            I am a nan
3    I am a Rolls Royce
dtype: object

With the help of na_action parameter, we can ignore the Null values as shown in below example.

In [40]:
s.map('I am a {}'.format, na_action='ignore')
Out[40]:
0            I am a BMW
1       I am a Mercedes
2                   NaN
3    I am a Rolls Royce
dtype: object

Conclusion

We have reached the end of this article, in this article we learned about pandas functions that can help in changing the shape of dataframe. The pandas functions learned are to_frame(), to_list(), astype(), get_dummies() and map(). We have learned about these functions with the help of examples.

Reference – https://pandas.pydata.org/docs/

 

Follow Us

Leave a Reply

Your email address will not be published. Required fields are marked *