Introduction
When we deal with data science and machine learning projects, often we encounter different kinds of files containing data. In this article, we will learn pandas read and write operations with various types of files like CSV (Comma Separated Values) file, JSON (Javascript Object Notation) files, and Excel files. Both the reading and writing operations are crucial for such file types because this will give us the ability to extract data and also store data as per our requirement. So let’s start this tutorial and learn about reading and writing different types of files using pandas.
Importing Pandas Library
We will start the article by importing pandas library
import pandas as pd
So the very first type of file which we will learn to read and write is csv file. Let’s look how csv files are read using pandas
Reading CSV file in Pandas : read_csv()
For reading CSV file, we use pandas read_csv function. This function basically helps in fetching the contents of CSV file into a dataframe. Now, let us look at the syntax of this pandas function.
Syntax
pandas.read_csv(filepath_or_buffer)
filepath_or_bufferstr : path object or file-like object – This is the parameter that takes string path for fetching the desired CSV file. The string can be a URL hosted on a server or a local file hosted on the user’s computer.
This function returns a two-dimensional data structure with labeled axes.
Example 1 : Reading CSV file with read_csv() in Pandas
Here a dataframe df is used to store the content of the CSV file read. Here simply with the help of read_csv(), we were able to fetch data from CSV file.
NOTE – Always remember to provide the path to the CSV file or any file inside inverted commas.
df = pd.read_csv('test.csv')
Here we get an unwanted column title i.e. “Unnamed:0”, this is because we have not specified the name of the first column. In the following example, we’ll see how to print dataframe without this column title.
df
Unnamed: 0 | Quantity Grown (in Kg) | State in which grown | |
---|---|---|---|
0 | Apple | 150 | Jammu & Kashmir |
1 | Mango | 225 | Madhya Pradesh |
2 | Banana | 450 | Karnataka |
3 | Watermelon | 75 | Maharashtra |
4 | Pineapple | 175 | Uttar Pradesh |
To remove the unwanted column title, we must use index_col parameter with [0] assigned as its value. This will make sure there is no such title.
df = pd.read_csv('test.csv',index_col=[0])
df
Quantity Grown (in Kg) | State in which grown | |
---|---|---|
Apple | 150 | Jammu & Kashmir |
Mango | 225 | Madhya Pradesh |
Banana | 450 | Karnataka |
Watermelon | 75 | Maharashtra |
Pineapple | 175 | Uttar Pradesh |
Example 2: Applying conditions while reading CSV file in Pandas
Here in this example, we have applied a condition to skip those rows which are not divisible by 3. So we only get one row i.e. the 3rd row which is actually divisible by 3 and other rows are skipped.
df_con = pd.read_csv('test.csv', skiprows=lambda x: x % 3 != 0,index_col=[0])
df_con
Quantity Grown (in Kg) | State in which grown | |
---|---|---|
Banana | 450 | Karnataka |
Example 3: Knowing the data types of the content
Using dtypes we can find the data types of the values present in CSV file.
df_dtype = pd.read_csv('test.csv',index_col=[0]).dtypes
df_dtype
Quantity Grown (in Kg) int64 State in which grown object dtype: object
Example 4: Removing column headers of CSV file with Pandas
Here when index_col and header parameter is specified as None then the columns of CSV file are removed and considered as rows. After this, columns are represented with the help of numbers as it is visible.
df_remov = pd.read_csv('test.csv',index_col=None, header=None)
df_remov
0 | 1 | 2 | |
---|---|---|---|
0 | NaN | Quantity Grown (in Kg) | State in which grown |
1 | Apple | 150 | Jammu & Kashmir |
2 | Mango | 225 | Madhya Pradesh |
3 | Banana | 450 | Karnataka |
4 | Watermelon | 75 | Maharashtra |
5 | Pineapple | 175 | Uttar Pradesh |
Now with the help of pandas, we will learn how to write the data to csv file.
Writing CSV file in Pandas : to_csv()
Using pandas to_csv function we can store the data in CSV file format. Now we will learn about its syntax and also look at the examples.
Syntax
pandas.to_csv(filepath_or_buffer)
filepath_or_bufferstr : path object or file-like object – This parameter is used for getting the file path or object, if None is provided the result is returned as a string.
If the user doesn’t provide a path or buffer for storing the contents, then it is considered None, and function returns the resulting csv format as a string. Otherwise returns None.
Example 1: Writing in CSV file using to_csv() in Pandas
Here in this example, a dataframe is created and then by using to_csv function, the output of the dataframe is stored in a CSV file.
df = pd.DataFrame({'fruits': ['Mango', 'Guava','Pineapple','Apple','Watermelon','Strawberry','Orange'],
'states_grown': ['Madhya Pradesh', 'Maharashtra', 'Karnataka', 'Jammu & Kashmir', 'Gujarat', 'Himachal Pradesh', 'Kerala'],
'quantity (in Kgs)': ['150', '200', '390','545','273','250','175']})
df
fruits | states_grown | quantity (in Kgs) | |
---|---|---|---|
0 | Mango | Madhya Pradesh | 150 |
1 | Guava | Maharashtra | 200 |
2 | Pineapple | Karnataka | 390 |
3 | Apple | Jammu & Kashmir | 545 |
4 | Watermelon | Gujarat | 273 |
5 | Strawberry | Himachal Pradesh | 250 |
6 | Orange | Kerala | 175 |
By using to_csv(), the name of the file is provided for storing it in the same folder where jupyter notebook is located.
df_csv = df.to_csv('store.csv')
Example 2: Displaying to_csv() data on console
Using the below code we can get the output of the dataframe in the output console. Though this may not have much practical value, this example shows that it is important to provide the path along with the filename for storing the CSV file generated else it will just display the output on the console.
Along with this, index value usage is also explained. Clearly, if the index is specified by using true, then we will get the index for the values in the output console. If not specified, then there will be no index.
df.to_csv(index=False)
'fruits,taste,season_eaten\r\nMango,sweet and sour,summer\r\nGuava,sweet and bitter,winter\r\n'
df.to_csv(index=True)
',fruits,taste,season_eaten\r\n0,Mango,sweet and sour,summer\r\n1,Guava,sweet and bitter,winter\r\n'
[adrotate banner=”3″]
Next in the list is the JSON file. We will first read the data from JSON file, so let’s look at the syntax and examples of it.
Reading JSON file in Pandas : read_json()
With the help of read_json function, we can convert JSON string to pandas object.
Syntax
pandas.read_json(path_or_buf=None,orient=None)
path_or_buf : a valid JSON str, path object or file-like object – Any valid string path is acceptable. We can either provide URLs hosted over server or local file storage in the user’s computer.
orient : str – This provides the format in which JSON file is returned.
This read_json function returns either a series or dataframe.
Example 1: Reading JSON file with read_json() in Pandas
Here we get the JSON output using orient parameter set to index.
df = pd.read_json('Playlists.json',orient='index')
df
0 | 1 | 2 | |
---|---|---|---|
Number_of_People_Listening | 550000 | 454000 | 653000 |
Playlist_Name | Rock | Jazz | Hip-Hop |
Example 2: Reading JSON file orient value as records
In this example, the JSON file was picked from local storage and then it was stored in a dataframe. We can provide different values for orient parameter.
df_play = pd.read_json('Playlists.json',orient='records')
df_play
Number_of_People_Listening | Playlist_Name | |
---|---|---|
0 | 550000 | Rock |
1 | 454000 | Jazz |
2 | 653000 | Hip-Hop |
After having learned how to read json file, let’s find how to write the data to json file.
Writing JSON file in Pandas : to_json()
This pandas function i.e. to_json() helps to convert the dataframe or pandas object to JSON string.
Syntax
pandas.to_json(path_buf=None,orient=None)
path_or_buf : a valid JSON str, path object or file-like object – An acceptable string is used for providing the location of the file to be stored. If not specified, the result is returned in the form of string.
orient – str – With the help of this parameter, we will be able to provide the expected format of JSON file.
Example 1: Writing JSON file using to_json() in Pandas
With the help of to_json function, we will build a JSON file that will store the contents of the pandas dataframe. The JSON files created using to_json() will be stored in the same location as of jupyter notebook’s.
Here the results are stored in the form of records.
df = pd.DataFrame([['56', '79'], ['48', '63']],
index=['Anil', 'Boman'],
columns=['Marks in English', 'Marks in Mathematics'])
df
Marks in English | Marks in Mathematics | |
---|---|---|
Anil | 56 | 79 |
Boman | 48 | 63 |
df.to_json('test.json',orient='records')
By using orient as records, we can see how the output looks like in json file.
[{'Marks in English': '56', 'Marks in Mathematics': '79'}, {'Marks in English': '48', 'Marks in Mathematics': '63'}]
Example 2: Storing the results in form of tables using to_json()
With the orientation as table, the JSON file will store the result in the form of a table. This orient parameter of the to_json() function helps in producing JSON files in different ways.
df.to_json('test_table.json',orient='table')
When we use orient as table then the JSON file stores the schema used for storing the data and then the data is also mentioned in it.
As you can see in the output, first the schema attribute is specified and then we can also specify the primary key in this. Lastly the data is stored.
{'schema': {'fields': [{'name': 'index', 'type': 'string'}, {'name': 'Marks in English', 'type': 'string'}, {'name': 'Marks in Mathematics', 'type': 'string'}], 'primaryKey': ['index'], 'pandas_version': '0.20.0'}, 'data': [{'index': 'Anil', 'Marks in English': '56', 'Marks in Mathematics': '79'}, {'index': 'Boman', 'Marks in English': '48', 'Marks in Mathematics': '63'}]}
Example 3: Displaying results in output console with to_json()
By using split value, we are able to view the results in the form of string as mentioned above. Here a JSON file is not created but results are displayed in the form of JSON content since the location of storing JSON file is not provided.
df.to_json(orient='split')
'{"columns":["Marks in English","Marks in Mathematics"],"index":["Anil","Boman"],"data":[["56","79"],["48","63"]]}'
Lastly, we will have a look at the ways in which we can read the data from excel file.
Reading Excel file in Pandas : read_excel()
By using the pandas read_excel() function, we can fetch the excel file into pandas dataframe. read_excel function gives the liberty to fetch data from a single sheet or multiple excel sheets.
Syntax
pandas.read_excel(io,sheet_name=0,kwds)
io : str, bytes, ExcelFile, xlrd.Book, path object, or file-like object – This paramter takes a valid string, on this location the excel file is stored.
sheet_name : str , int, list, or None, default 0 – Here we can provide the names or numbers of sheets which are to be fetched for data.
kwds – Other keyword arguments
The output of this function is a dataframe or dictionary of dataframes.
Example 1: Reading excel file with read_excel()
By using read_excel function of pandas, we can fetch the data in dataframes
df = pd.read_excel('test.xlsx', index_col=0)
df
Subjects Liked | Number of Hours studied | |
---|---|---|
Student Name | ||
Rohit | Mathematics | 12 |
Virat | English | 15 |
Shikhar | Hindi | 3 |
Shreyas | Physics | 9 |
Manish | Chemistry | 7 |
Example 2: Reading data from specific sheets of excel file in Pandas
read_excel gives the option of obtaining information from specific sheets of excel file. Let’s see how it can be done.
The open method is used for opening the excel file and rb parameter specifies that user is looking to fetch the contents of the file.
df_check = pd.read_excel(open('test.xlsx', 'rb'),
sheet_name='Sheet2')
df_check
Student Name | Marks Obtained | |
---|---|---|
0 | Rohit | 56 |
1 | Virat | 97 |
2 | Shikhar | 35 |
3 | Shreyas | 77 |
4 | Manish | 89 |
Example 3: Removing the column names and headers from Excel
With the help of read_excel(), we can make slight changes to the structure of the excel file. The headers can be removed by specifying None to the header parameter. In the output, we can see how the column headers are considered as rows entries.
df_test = pd.read_excel('test.xlsx', index_col=None, header=None)
df_test
0 | 1 | 2 | |
---|---|---|---|
0 | Student Name | Subjects Liked | Number of Hours studied |
1 | Rohit | Mathematics | 12 |
2 | Virat | English | 15 |
3 | Shikhar | Hindi | 3 |
4 | Shreyas | Physics | 9 |
5 | Manish | Chemistry | 7 |
Along with the reading of data, now it’s time to learn about writing the data to excel file.
Writing to Excel file in Pandas: to_excel()
For writing the object to excel file, to_excel() function of pandas is used. Let’s look at the syntax of this function.
Syntax
dataframe.to_excel(excel_writer,sheet_name=’Sheet1′)
excel_writer : str or ExcelWriter object – This is the path specified for storing the file or for excel writer
sheet_name : str, default ‘Sheet1’ – Name of sheet which will store the content of dataframe. If not specified then, sheet1 is used.
Example 1: Writing into excel with to_excel() in Pandas
Pandas to_excel function is helpful in creating excel files using our dataframe content.
df = pd.DataFrame([['56', '79'], ['48', '63']],
index=['Anil', 'Boman'],
columns=['Marks in English', 'Marks in Mathematics'])
df.to_excel("output.xlsx")
Example 2: Storing data to a specific excel sheet in Pandas
Similar to the read_excel function, we have the liberty to store the data to the desired excel sheet.
df.to_excel("output.xlsx",
sheet_name='Sheet2')
Example 3: Storing data to multiple excel sheets in Pandas
For storing dataframe contents to multiple sheets, we need an excel writer. With this excel writer we can store the information using to_excel function. First we have created another dataframe using copy function of pandas.
df1 = pd.DataFrame([['75', '97'], ['25', '99']],
index=['Rakesh', 'Suresh'],
columns=['Marks in English', 'Marks in Mathematics'])
df1
Marks in English | Marks in Mathematics | |
---|---|---|
Rakesh | 75 | 97 |
Suresh | 25 | 99 |
Now with the help of these dataframes and excel writer, we are able to store the contents in multiple excel sheets.
with pd.ExcelWriter('output_mul.xlsx') as writer:
df.to_excel(writer, sheet_name='Sheet_name_1')
df1.to_excel(writer, sheet_name='Sheet_name_2')
Conclusion
Concluding this article, we looked at ways of handling different kinds of files using pandas. The focus of this tutorial was to learn how to read and write csv files, json files and excel files using pandas library in-built functions. Furthermore, some examples were covered to deeply understand pandas read and write functions and how they can be used.
- Also Read – Tutorial – Pandas Drop, Pandas Dropna, Pandas Drop Duplicate
- Also Read – Pandas Visualization Tutorial – Bar Plot, Histogram, Scatter Plot, Pie Chart
- Also Read – Tutorial – Pandas Concat, Pandas Append, Pandas Merge, Pandas Join
Reference – https://pandas.pydata.org/docs/