Introduction
In this article, we will learn about pandas function of iterrows(), datareader() and operation for selecting rows by value. These are very useful for accessing data inside the dataframes while working with your machine learning and data science projects.
Importing Pandas Library
Let’s start the article by importing the Pandas library.
import pandas as pd
Pandas Iterrows : iterrows()
This function helps in iterating over Dataframe rows as (Index,Series) pairs. This is used for accessing the data contained in dataframes.
Syntax
DataFrame.iterrows()
The function returns index of the row. The data of the rows is obtained in the form of series. Lastly, we get a generator that iterates over the frame.
Example 1: Simple example of iterrows()
In this example of iterrows, we can access the data using this function.
df = pd.DataFrame([[7, 8.5]], columns=['float', 'int'])
Since iterrows() returns iterator, we can use next function to see the content of the iterator. We can see that iterrows returns a tuple with row index and row data as a Series object
row = next(df.iterrows())[1]
row
float 7.0 int 8.5 Name: 0, dtype: float64
Example 2: Using iterrows with for loop
In this example, iterrows is used with “for” loop.
fruits = pd.DataFrame({'seed_count': [8, 25, 50, 1],
'quantity': [10, 23, 32, 45],
'water_content': [5, 20, 25, 30]},
index=['orange', 'watermelon', 'strawberry', 'mango'])
fruits
seed_count | quantity | water_content | |
---|---|---|---|
orange | 8 | 10 | 5 |
watermelon | 25 | 23 | 20 |
strawberry | 50 | 32 | 25 |
mango | 1 | 45 | 30 |
for x,y in fruits.iterrows():
if y['quantity'] in [10,45]:
y['quantity'] = 50
print(x,y)
orange seed_count 8 quantity 50 water_content 5 Name: orange, dtype: int64 mango seed_count 1 quantity 50 water_content 30 Name: mango, dtype: int64
Pandas Datareader : Datareader()
Datareader library of pandas is useful to those people who are looking to analyse stocks data of different countries. For some stocks, there is a requirement of an API key.
Example 1: Using datareader library to load stocks of Fred
The datareader library helps in fetching the data of different stocks of various countries.
import pandas_datareader.data as web
import datetime
After loading the library, we are specifying the dates for fetching desired stocks from the time range.
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2013, 1, 27)
gdp = web.DataReader('GDP', 'fred', start, end)
gdp.loc['2013-01-01']
GDP 16569.591 Name: 2013-01-01 00:00:00, dtype: float64
By using the datareader library, we can fetch the values of stocks of different variants.
inflation = web.DataReader(['CPIAUCSL', 'CPILFESL'], 'fred', start, end)
inflation.head()
CPIAUCSL | CPILFESL | |
---|---|---|
DATE | ||
2010-01-01 | 217.488 | 220.633 |
2010-02-01 | 217.281 | 220.731 |
2010-03-01 | 217.353 | 220.783 |
2010-04-01 | 217.403 | 220.822 |
2010-05-01 | 217.290 | 220.962 |
Example 2: Fetching World Bank’s data
By using pandas datareader, we can also fetch the World Bank’s data. DataReader can also get the values of other stocks using API’s.
from pandas_datareader import wb
matches = wb.search('gdp.*capita.*const')
dat = wb.download(indicator='NY.GDP.PCAP.KD', country=['US', 'CA', 'MX'], start=2012, end=2018)
print(dat)
NY.GDP.PCAP.KD country year Canada 2018 51391.708417 2017 51150.754618 2016 50263.834327 2015 50279.585839 2014 50309.159896 2013 49400.458302 2012 48788.333032 Mexico 2018 10403.540397 2017 10301.357885 2016 10205.795753 2015 10037.201490 2014 9839.050191 2013 9693.722969 2012 9690.869065 United States 2018 54579.016837 2017 53356.236236 2016 52534.365284 2015 52099.269760 2014 51015.135477 2013 50161.075821 2012 49596.421917
Pandas Selecting rows by value
The pandas library gives us the ability to select rows from a dataframe based on the values present in it. For fetching these values, we can use different conditions.
Example 1: Selecting rows by value
This pandas operation helps us in selecting rows by filtering it through a condition of columns.
# importing pandas
import pandas as pd
record = { 'E_Name': ['Ankit', 'Amit', 'Aishwarya', 'Priyanka', 'Priya', 'Shaurya' ],
'Age': [21, 19, 20, 18, 17, 21],
'Section': ['Management', 'Marketing', 'Finance', 'Sales', 'Technical', 'HR'],
'Salary': [88, 92, 95, 70, 65, 78] }
# create a dataframe
dataframe = pd.DataFrame(record, columns = ['E_Name', 'Age', 'Section', 'Salary'])
print(dataframe)
E_Name Age Section Salary 0 Ankit 21 Management 88 1 Amit 19 Marketing 92 2 Aishwarya 20 Finance 95 3 Priyanka 18 Sales 70 4 Priya 17 Technical 65 5 Shaurya 21 HR 78
Here employees with salary greater than 80 are only selected.
# selecting rows based on condition
result = dataframe[dataframe['Salary'] > 80]
print(result)
E_Name Age Section Salary 0 Ankit 21 Management 88 1 Amit 19 Marketing 92 2 Aishwarya 20 Finance 95
Example 2: Selecting specific rows using pandas
In this example, employee/employees who do not have salary equal to 95 are selected.
# selecting rows based on condition
result = dataframe.loc[dataframe['Salary'] != 95]
print(result)
E_Name Age Section Salary 0 Ankit 21 Management 88 1 Amit 19 Marketing 92 3 Priyanka 18 Sales 70 4 Priya 17 Technical 65 5 Shaurya 21 HR 78
Conclusion
In this tutorial, we looked at pandas functions and libraries used for fetching data from different sources. The pandas function which we covered were namely pandas iterrows(), pandas in-built library datareader() and pandas operation of selection of rows using values. Through this tutorial, we looked at different examples of these functions and also tried to understand the details of the syntax.
- Also Read – Tutorial – Pandas Drop, Pandas Dropna, Pandas Drop Duplicate
- Also Read – Pandas Visualization Tutorial – Bar Plot, Histogram, Scatter Plot, Pie Chart
- Also Read – Tutorial – Pandas Concat, Pandas Append, Pandas Merge, Pandas Join
Reference – https://pandas.pydata.org/docs/