Pandas DataFrame Tutorial – Selecting Rows by Value, Iterrows and DataReader

Selecting Rows by Value, Iterrows and DataReader

Introduction

In this article, we will learn about pandas function of iterrows(), datareader() and operation for selecting rows by value. These are very useful for accessing data inside the dataframes while working with your machine learning and data science projects.

Importing Pandas Library

Let’s start the article by importing the Pandas library.

In [1]:
import pandas as pd

Pandas Iterrows : iterrows()

This function helps in iterating over Dataframe rows as (Index,Series) pairs. This is used for accessing the data contained in dataframes.

Syntax

DataFrame.iterrows()

The function returns index of the row. The data of the rows is obtained in the form of series. Lastly, we get a generator that iterates over the frame.

Example 1: Simple example of iterrows()

In this example of iterrows, we can access the data using this function.

In [2]:
df = pd.DataFrame([[7, 8.5]], columns=['float', 'int'])

Since iterrows() returns iterator, we can use next function to see the content of the iterator. We can see that iterrows returns a tuple with row index and row data as a Series object

In [3]:
row = next(df.iterrows())[1]
In [4]:
row
Out[4]:
float    7.0
int      8.5
Name: 0, dtype: float64

Example 2: Using iterrows with for loop

In this example, iterrows is used with “for” loop.

In [5]:
fruits = pd.DataFrame({'seed_count': [8, 25, 50, 1],
                   'quantity': [10, 23, 32, 45],
                   'water_content': [5, 20, 25, 30]},
                  index=['orange', 'watermelon', 'strawberry', 'mango'])
In [6]:
fruits
Out[6]:
seed_count quantity water_content
orange 8 10 5
watermelon 25 23 20
strawberry 50 32 25
mango 1 45 30
In [7]:
for x,y in fruits.iterrows():
    if y['quantity'] in [10,45]:
        y['quantity'] = 50
        print(x,y)
Output
orange seed_count        8
quantity         50
water_content     5
Name: orange, dtype: int64
mango seed_count        1
quantity         50
water_content    30
Name: mango, dtype: int64
[adrotate banner=”3″]

Pandas Datareader : Datareader()

Datareader library of pandas is useful to those people who are looking to analyse stocks data of different countries. For some stocks, there is a requirement of an API key.

Example 1: Using datareader library to load stocks of Fred

The datareader library helps in fetching the data of different stocks of various countries.

In [8]:
import pandas_datareader.data as web
import datetime

After loading the library, we are specifying the dates for fetching desired stocks from the time range.

In [9]:
start = datetime.datetime(2010, 1, 1)
In [10]:
end = datetime.datetime(2013, 1, 27)
In [11]:
gdp = web.DataReader('GDP', 'fred', start, end)
In [12]:
gdp.loc['2013-01-01']
Out[12]:
GDP    16569.591
Name: 2013-01-01 00:00:00, dtype: float64

By using the datareader library, we can fetch the values of stocks of different variants.

In [13]:
inflation = web.DataReader(['CPIAUCSL', 'CPILFESL'], 'fred', start, end)
In [14]:
inflation.head()
Out[14]:
CPIAUCSL CPILFESL
DATE
2010-01-01 217.488 220.633
2010-02-01 217.281 220.731
2010-03-01 217.353 220.783
2010-04-01 217.403 220.822
2010-05-01 217.290 220.962

Example 2: Fetching World Bank’s data

By using pandas datareader, we can also fetch the World Bank’s data. DataReader can also get the values of other stocks using API’s.

In [15]:
from pandas_datareader import wb
In [16]:
matches = wb.search('gdp.*capita.*const')
In [17]:
dat = wb.download(indicator='NY.GDP.PCAP.KD', country=['US', 'CA', 'MX'], start=2012, end=2018)
In [18]:
print(dat)
Output
                    NY.GDP.PCAP.KD
country       year                
Canada        2018    51391.708417
              2017    51150.754618
              2016    50263.834327
              2015    50279.585839
              2014    50309.159896
              2013    49400.458302
              2012    48788.333032
Mexico        2018    10403.540397
              2017    10301.357885
              2016    10205.795753
              2015    10037.201490
              2014     9839.050191
              2013     9693.722969
              2012     9690.869065
United States 2018    54579.016837
              2017    53356.236236
              2016    52534.365284
              2015    52099.269760
              2014    51015.135477
              2013    50161.075821
              2012    49596.421917

Pandas Selecting rows by value

The pandas library gives us the ability to select rows from a dataframe based on the values present in it. For fetching these values, we can use different conditions.

Example 1: Selecting rows by value

This pandas operation helps us in selecting rows by filtering it through a condition of columns.

In [19]:
# importing pandas
import pandas as pd 

record = { 'E_Name': ['Ankit', 'Amit', 'Aishwarya', 'Priyanka', 'Priya', 'Shaurya' ], 
'Age': [21, 19, 20, 18, 17, 21], 
'Section': ['Management', 'Marketing', 'Finance', 'Sales', 'Technical', 'HR'], 
'Salary': [88, 92, 95, 70, 65, 78] } 
In [20]:
# create a dataframe
dataframe = pd.DataFrame(record, columns = ['E_Name', 'Age', 'Section', 'Salary']) 

print(dataframe) 
Output
      E_Name  Age     Section  Salary
0      Ankit   21  Management      88
1       Amit   19   Marketing      92
2  Aishwarya   20     Finance      95
3   Priyanka   18       Sales      70
4      Priya   17   Technical      65
5    Shaurya   21          HR      78

Here employees with salary greater than 80 are only selected.

In [21]:
# selecting rows based on condition
result = dataframe[dataframe['Salary'] > 80] 

print(result) 
Output
      E_Name  Age     Section  Salary
0      Ankit   21  Management      88
1       Amit   19   Marketing      92
2  Aishwarya   20     Finance      95

Example 2: Selecting specific rows using pandas

In this example, employee/employees who do not have salary equal to 95 are selected.

In [22]:
# selecting rows based on condition
result = dataframe.loc[dataframe['Salary'] != 95] 
  
print(result) 
Output
     E_Name  Age     Section  Salary
0     Ankit   21  Management      88
1      Amit   19   Marketing      92
3  Priyanka   18       Sales      70
4     Priya   17   Technical      65
5   Shaurya   21          HR      78

Conclusion

In this tutorial, we looked at pandas functions and libraries used for fetching data from different sources. The pandas function which we covered were namely pandas iterrows(), pandas in-built library datareader() and pandas operation of selection of rows using values. Through this tutorial, we looked at different examples of these functions and also tried to understand the details of the syntax.

Reference – https://pandas.pydata.org/docs/

Follow Us

Leave a Reply

Your email address will not be published. Required fields are marked *