Pandas Read and Write operations with CSV , JSON and Excel Files

Pandas Write and Read CSV , JSON and Excel File

Introduction

When we deal with data science and machine learning projects, often we encounter different kinds of files containing data. In this article, we will learn pandas read and write operations with various types of files like CSV (Comma Separated Values) file, JSON (Javascript Object Notation) files, and Excel files. Both the reading and writing operations are crucial for such file types because this will give us the ability to extract data and also store data as per our requirement. So let’s start this tutorial and learn about reading and writing different types of files using pandas.

Importing Pandas Library

We will start the article by importing pandas library

In [1]:
import pandas as pd

So the very first type of file which we will learn to read and write is csv file. Let’s look how csv files are read using pandas

Reading CSV file in Pandas : read_csv()

For reading CSV file, we use pandas read_csv function. This function basically helps in fetching the contents of CSV file into a dataframe. Now, let us look at the syntax of this pandas function.

Syntax

pandas.read_csv(filepath_or_buffer)

filepath_or_bufferstr : path object or file-like object – This is the parameter that takes string path for fetching the desired CSV file. The string can be a URL hosted on a server or a local file hosted on the user’s computer.

This function returns a two-dimensional data structure with labeled axes.

Example 1 : Reading CSV file with read_csv() in Pandas

Here a dataframe df is used to store the content of the CSV file read. Here simply with the help of read_csv(), we were able to fetch data from CSV file.

NOTE – Always remember to provide the path to the CSV file or any file inside inverted commas.

In [2]:
df = pd.read_csv('test.csv')

Here we get an unwanted column title i.e. “Unnamed:0”, this is because we have not specified the name of the first column. In the following example, we’ll see how to print dataframe without this column title.

In [3]:
df
Out[3]:
Unnamed: 0 Quantity Grown (in Kg) State in which grown
0 Apple 150 Jammu & Kashmir
1 Mango 225 Madhya Pradesh
2 Banana 450 Karnataka
3 Watermelon 75 Maharashtra
4 Pineapple 175 Uttar Pradesh

To remove the unwanted column title, we must use index_col parameter with [0] assigned as its value. This will make sure there is no such title.

In [4]:
df = pd.read_csv('test.csv',index_col=[0])
In [5]:
df
Out[5]:
Quantity Grown (in Kg) State in which grown
Apple 150 Jammu & Kashmir
Mango 225 Madhya Pradesh
Banana 450 Karnataka
Watermelon 75 Maharashtra
Pineapple 175 Uttar Pradesh

Example 2: Applying conditions while reading CSV file in Pandas

Here in this example, we have applied a condition to skip those rows which are not divisible by 3. So we only get one row i.e. the 3rd row which is actually divisible by 3 and other rows are skipped.

In [5]:
df_con = pd.read_csv('test.csv', skiprows=lambda x: x % 3 != 0,index_col=[0])
In [6]:
df_con
Out[7]:
Quantity Grown (in Kg) State in which grown
Banana 450 Karnataka

Example 3: Knowing the data types of the content

Using dtypes we can find the data types of the values present in CSV file.

In [7]:
df_dtype = pd.read_csv('test.csv',index_col=[0]).dtypes
In [8]:
df_dtype
Out[8]:
Quantity Grown (in Kg)     int64
State in which grown      object
dtype: object

Example 4: Removing column headers of CSV file with Pandas

Here when index_col and header parameter is specified as None then the columns of CSV file are removed and considered as rows. After this, columns are represented with the help of numbers as it is visible.

In [9]:
df_remov = pd.read_csv('test.csv',index_col=None, header=None)
In [9]:
df_remov
Out[10]:
0 1 2
0 NaN Quantity Grown (in Kg) State in which grown
1 Apple 150 Jammu & Kashmir
2 Mango 225 Madhya Pradesh
3 Banana 450 Karnataka
4 Watermelon 75 Maharashtra
5 Pineapple 175 Uttar Pradesh

Now with the help of pandas, we will learn how to write the data to csv file.

Writing CSV file in Pandas : to_csv()

Using pandas to_csv function we can store the data in CSV file format. Now we will learn about its syntax and also look at the examples.

Syntax

pandas.to_csv(filepath_or_buffer)

filepath_or_bufferstr : path object or file-like object – This parameter is used for getting the file path or object, if None is provided the result is returned as a string.

If the user doesn’t provide a path or buffer for storing the contents, then it is considered None, and function returns the resulting csv format as a string. Otherwise returns None.

Example 1: Writing in CSV file using to_csv() in Pandas

Here in this example, a dataframe is created and then by using to_csv function, the output of the dataframe is stored in a CSV file.

In [11]:
df = pd.DataFrame({'fruits': ['Mango', 'Guava','Pineapple','Apple','Watermelon','Strawberry','Orange'],

                   'states_grown': ['Madhya Pradesh', 'Maharashtra', 'Karnataka', 'Jammu & Kashmir', 'Gujarat', 'Himachal Pradesh', 'Kerala'],

                   'quantity (in Kgs)': ['150', '200', '390','545','273','250','175']})
In [12]:
df
Out[12]:
fruits states_grown quantity (in Kgs)
0 Mango Madhya Pradesh 150
1 Guava Maharashtra 200
2 Pineapple Karnataka 390
3 Apple Jammu & Kashmir 545
4 Watermelon Gujarat 273
5 Strawberry Himachal Pradesh 250
6 Orange Kerala 175

By using to_csv(), the name of the file is provided for storing it in the same folder where jupyter notebook is located.

In [13]:
df_csv = df.to_csv('store.csv')

Example 2: Displaying to_csv() data on console

Using the below code we can get the output of the dataframe in the output console. Though this may not have much practical value, this example shows that it is important to provide the path along with the filename for storing the CSV file generated else it will just display the output on the console.

Along with this, index value usage is also explained. Clearly, if the index is specified by using true, then we will get the index for the values in the output console. If not specified, then there will be no index.

In [14]:
df.to_csv(index=False)
Out[14]:
'fruits,taste,season_eaten\r\nMango,sweet and sour,summer\r\nGuava,sweet and bitter,winter\r\n'
In [15]:
df.to_csv(index=True)
Out[15]:
',fruits,taste,season_eaten\r\n0,Mango,sweet and sour,summer\r\n1,Guava,sweet and bitter,winter\r\n'

[adrotate banner=”3″]

Next in the list is the JSON file. We will first read the data from JSON file, so let’s look at the syntax and examples of it.

Reading JSON file in Pandas : read_json()

With the help of read_json function, we can convert JSON string to pandas object.

Syntax

pandas.read_json(path_or_buf=None,orient=None)

path_or_buf : a valid JSON str, path object or file-like object – Any valid string path is acceptable. We can either provide URLs hosted over server or local file storage in the user’s computer.

orient : str – This provides the format in which JSON file is returned.

This read_json function returns either a series or dataframe.

Example 1: Reading JSON file with read_json() in Pandas

Here we get the JSON output using orient parameter set to index.

In [16]:
df = pd.read_json('Playlists.json',orient='index')
In [17]:
df
Out[17]:
0 1 2
Number_of_People_Listening 550000 454000 653000
Playlist_Name Rock Jazz Hip-Hop

Example 2: Reading JSON file orient value as records

In this example, the JSON file was picked from local storage and then it was stored in a dataframe. We can provide different values for orient parameter.

In [18]:
df_play = pd.read_json('Playlists.json',orient='records')
In [19]:
df_play
Out[20]:
Number_of_People_Listening Playlist_Name
0 550000 Rock
1 454000 Jazz
2 653000 Hip-Hop

After having learned how to read json file, let’s find how to write the data to json file.

Writing JSON file in Pandas : to_json()

This pandas function i.e. to_json() helps to convert the dataframe or pandas object to JSON string.

Syntax

pandas.to_json(path_buf=None,orient=None)

path_or_buf : a valid JSON str, path object or file-like object – An acceptable string is used for providing the location of the file to be stored. If not specified, the result is returned in the form of string.

orient – str – With the help of this parameter, we will be able to provide the expected format of JSON file.

Example 1: Writing JSON file using to_json() in Pandas

With the help of to_json function, we will build a JSON file that will store the contents of the pandas dataframe. The JSON files created using to_json() will be stored in the same location as of jupyter notebook’s.

Here the results are stored in the form of records.

In [21]:
df = pd.DataFrame([['56', '79'], ['48', '63']],

                  index=['Anil', 'Boman'],

                  columns=['Marks in English', 'Marks in Mathematics'])
In [22]:
df
Out[22]:
Marks in English Marks in Mathematics
Anil 56 79
Boman 48 63
In [23]:
df.to_json('test.json',orient='records')
Output of JSON file

By using orient as records, we can see how the output looks like in json file.

[{'Marks in English': '56', 'Marks in Mathematics': '79'},
 {'Marks in English': '48', 'Marks in Mathematics': '63'}]

Example 2: Storing the results in form of tables using to_json()

With the orientation as table, the JSON file will store the result in the form of a table. This orient parameter of the to_json() function helps in producing JSON files in different ways.

In [24]:
df.to_json('test_table.json',orient='table')
Output of JSON file

When we use orient as table then the JSON file stores the schema used for storing the data and then the data is also mentioned in it.

As you can see in the output, first the schema attribute is specified and then we can also specify the primary key in this. Lastly the data is stored.

{'schema': {'fields': [{'name': 'index', 'type': 'string'},
   {'name': 'Marks in English', 'type': 'string'},
   {'name': 'Marks in Mathematics', 'type': 'string'}],
  'primaryKey': ['index'],
  'pandas_version': '0.20.0'},
 'data': [{'index': 'Anil',
   'Marks in English': '56',
   'Marks in Mathematics': '79'},
  {'index': 'Boman', 'Marks in English': '48', 'Marks in Mathematics': '63'}]}

Example 3: Displaying results in output console with to_json()

By using split value, we are able to view the results in the form of string as mentioned above. Here a JSON file is not created but results are displayed in the form of JSON content since the location of storing JSON file is not provided.

In [25]:
df.to_json(orient='split')
Out[25]:
'{"columns":["Marks in English","Marks in Mathematics"],"index":["Anil","Boman"],"data":[["56","79"],["48","63"]]}'

Lastly, we will have a look at the ways in which we can read the data from excel file.

Reading Excel file in Pandas : read_excel()

By using the pandas read_excel() function, we can fetch the excel file into pandas dataframe. read_excel function gives the liberty to fetch data from a single sheet or multiple excel sheets.

Syntax

pandas.read_excel(io,sheet_name=0,kwds)

io : str, bytes, ExcelFile, xlrd.Book, path object, or file-like object – This paramter takes a valid string, on this location the excel file is stored.

sheet_name : str , int, list, or None, default 0 – Here we can provide the names or numbers of sheets which are to be fetched for data.

kwds – Other keyword arguments

The output of this function is a dataframe or dictionary of dataframes.

Example 1: Reading excel file with read_excel()

By using read_excel function of pandas, we can fetch the data in dataframes

In [26]:
df = pd.read_excel('test.xlsx', index_col=0)  
In [27]:
df
Out[27]:
Subjects Liked Number of Hours studied
Student Name
Rohit Mathematics 12
Virat English 15
Shikhar Hindi 3
Shreyas Physics 9
Manish Chemistry 7

Example 2: Reading data from specific sheets of excel file in Pandas

read_excel gives the option of obtaining information from specific sheets of excel file. Let’s see how it can be done.

The open method is used for opening the excel file and rb parameter specifies that user is looking to fetch the contents of the file.

In [28]:
df_check = pd.read_excel(open('test.xlsx', 'rb'),

              sheet_name='Sheet2')  
In [29]:
df_check
Out[29]:
Student Name Marks Obtained
0 Rohit 56
1 Virat 97
2 Shikhar 35
3 Shreyas 77
4 Manish 89

Example 3: Removing the column names and headers from Excel

With the help of read_excel(), we can make slight changes to the structure of the excel file. The headers can be removed by specifying None to the header parameter. In the output, we can see how the column headers are considered as rows entries.

In [30]:
df_test = pd.read_excel('test.xlsx', index_col=None, header=None)  
In [31]:
df_test
Out[31]:
0 1 2
0 Student Name Subjects Liked Number of Hours studied
1 Rohit Mathematics 12
2 Virat English 15
3 Shikhar Hindi 3
4 Shreyas Physics 9
5 Manish Chemistry 7

Along with the reading of data, now it’s time to learn about writing the data to excel file.

Writing to Excel file in Pandas: to_excel()

For writing the object to excel file, to_excel() function of pandas is used. Let’s look at the syntax of this function.

Syntax

dataframe.to_excel(excel_writer,sheet_name=’Sheet1′)

excel_writer : str or ExcelWriter object – This is the path specified for storing the file or for excel writer

sheet_name : str, default ‘Sheet1’ – Name of sheet which will store the content of dataframe. If not specified then, sheet1 is used.

Example 1: Writing into excel with to_excel() in Pandas

Pandas to_excel function is helpful in creating excel files using our dataframe content.

In [32]:
df = pd.DataFrame([['56', '79'], ['48', '63']],

                  index=['Anil', 'Boman'],

                  columns=['Marks in English', 'Marks in Mathematics'])
In [33]:
df.to_excel("output.xlsx")  

Example 2: Storing data to a specific excel sheet in Pandas

Similar to the read_excel function, we have the liberty to store the data to the desired excel sheet.

In [34]:
df.to_excel("output.xlsx",

             sheet_name='Sheet2')  

Example 3: Storing data to multiple excel sheets in Pandas

For storing dataframe contents to multiple sheets, we need an excel writer. With this excel writer we can store the information using to_excel function. First we have created another dataframe using copy function of pandas.

In [35]:
df1 = pd.DataFrame([['75', '97'], ['25', '99']],

                  index=['Rakesh', 'Suresh'],

                  columns=['Marks in English', 'Marks in Mathematics'])
In [36]:
df1
Out[36]:
Marks in English Marks in Mathematics
Rakesh 75 97
Suresh 25 99

Now with the help of these dataframes and excel writer, we are able to store the contents in multiple excel sheets.

In [37]:
with pd.ExcelWriter('output_mul.xlsx') as writer:  

    df.to_excel(writer, sheet_name='Sheet_name_1')

    df1.to_excel(writer, sheet_name='Sheet_name_2')

Conclusion

Concluding this article, we looked at ways of handling different kinds of files using pandas. The focus of this tutorial was to learn how to read and write csv files, json files and excel files using pandas library in-built functions. Furthermore, some examples were covered to deeply understand pandas read and write functions and how they can be used.

Reference – https://pandas.pydata.org/docs/

Follow Us

Leave a Reply

Your email address will not be published. Required fields are marked *