Table of Contents

Introduction

In this article, we will learn how to scrape data from Spotify which is a popular music streaming and podcast platform. This scraping will be done by using a Web API of Spotify, known as Spotipy. Our aim through this hands-on experience of web scraping is to fetch the information of all the tracks in Spotify playlists. We can obtain the information of tracks of any playlist, we only need to have URI (Uniform Resource Identifier) of a playlist.

For using Spotipy API, the user requires two credential keys, which are credential_id and credential_secret. These two keys are unique for each user and help Spotify in identifying the users of their Web API. In the following part of this article, I will walk through the steps on how to create these two unique keys through Spotify developers account with the help of screenshots.

Generating Authorizing Keys for Spotipy

Step 1: Creating Spotify Developers Account

For creating an account on Spotify developers website, you can either use your existing Spotify account which you use for listening to music, otherwise, you can sign up using your Facebook account or through your mail-id.

Step 2: Creating a New App

After creating the account and logging in, you will find a screen as shown below.

The next thing which we have to do is to create a new app by either clicking on create client id button or by clicking on create new app button.

Creating New App in Spotify Developer Dashboard

After this, Spotify asks some basic questions regarding our new app. After this, you have to tell Spotify whether you will use the app for any monetary advantages i.e. whether the app is commercial or not. It is advised to choose the non-commercial option. Lastly, we need to agree to some permissions and agreements

Step 3: Obtaining Client Id and Client Secret Keys

Once the app is created, we will be able to see a dashboard along with the name of our app and description, below the description we will find 32 characters long alphanumeric Client Id and below this will get the 32 characters long alphanumeric Client Secret. So this was the aim of the walkthrough, we have obtained the two keys required for authorizing the usage of Spotify Web API.

NOTE – Remember not to use the keys visible in the above image, as mentioned earlier these are unique keys and thus you’ll be required to create them for your app, otherwise you may encounter an error.

Now it’s time to start our hands-on practical example where we will fetch the playlists data and track information using spotipy.

Importing Spotipy library and authorization credentials

Initially, we need to load the necessary libraries and credential files.

import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import json

For better access to credential keys, We will load these authorization keys in a .json file, simplifying the process of fetching.

The below code shows how you can add the two keys in the authorization.json file.

Remember the below keys are for representing purpose, you are required to create your own client id and client secret on the Spotify developers account.

{"client_id": "5e9a80618b284145b54bb1f7df94bb6c",
"client_secret": "0cdef7160e4143118e48abdd939668e8"}

In another playlists_like_dislike.json file, We have mentioned the URI (Uniform Resource Identifier) of the playlists. This file will help in managing the URI of multiple playlists. As informed earlier, URI helps in identifying different playlists on Spotify.

One of the features of playlists_like_dislike.json file is the like attribute which takes a boolean value. Using this boolean value, we can tell which playlists songs a user likes or dislikes. If true, the user likes all the songs of the playlist and if false, the user dislikes all the songs of the playlist.

We can see below how the playlists URI should be mentioned along with the like parameter.

NOTE – You can fetch data of up to 99 songs in a single connection session. Now, these 99 songs can be divided into different playlists or can be present in a single playlist. If you will try to fetch information of more than 99 songs in a single connection session, then you will encounter an error and you will not be able to fetch the data, as per Spotify’s policy.

[
{“uri”: “spotify:user:Test_1:playlist:27rVIOLKlIKAg63whXrAzz”,
“like”: true},
{“uri”: “spotify:user:Test_2:playlist:6WjtPvXBC2iSO24VsfBpnc”,
“like”: false}
]

How to get the URI from Spotify Playlist

Fetch Playlist URI from Spotify App — Fetching Playlist URI from Spotify App

In the above image, you can see the location from where you can obtain the URI of a particular playlist, in the case of the Spotify desktop app. You have to click at these three dots and then you can copy the URI of the desired playlist. Here playlist link and Spotify URI contain the same link. So you can copy any of these. The URI is 20 characters long alphanumeric code which is present at the end of the link.

The below image is an example of the web app of Spotify, so if you are using the web app, then you will see something as shown below.

Fetching Playlist URI from Spotify Web App

This URI will help while communicating with Spotify API and also in fetching the correct information of the songs present in the playlists.

Along with this, there can be multiple playlists in a single .json file, so to access each playlist, indexing is used and here, 0th index playlist is accessed using below code.

credentials = json.load(open('authorization.json'))
client_id = credentials['client_id']
client_secret = credentials['client_secret']

playlist_index = 0

playlists = json.load(open('playlists_like_dislike.json'))
playlist_uri = playlists[playlist_index]['uri']
like = playlists[playlist_index]['like']

client_credentials_manager = SpotifyClientCredentials(client_id=client_id,client_secret=client_secret)

sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

Getting Information of tracks in Spotify playlist

For obtaining the username and playlist_id, we are using the : as an identifier. These unique id’s help in getting the tracks of each playlist.

uri = playlist_uri    # the URI is split by ':' to get the username and playlist ID
username = uri.split(':')[2]
playlist_id = uri.split(':')[4]

results = sp.user_playlist(username, playlist_id, 'tracks')

Fetching details of the track like ID’s, Titles and Artists

Here data for each playlist is collected in the form of a dictionary. The keys of this dictionary contain track information. We will use list datatype for adding data like IDs, Titles and Artists.

Artists mentioned are the primary artists in each track.

In the below mentioned for loop, we are going over each track of a playlist and adding the track Id, name and artist information to the dataframe of tracks.

playlist_tracks_data = results['tracks']
playlist_tracks_id = []
playlist_tracks_titles = []
playlist_tracks_artists = []
playlist_tracks_first_artists = []

for track in playlist_tracks_data['items']:
    playlist_tracks_id.append(track['track']['id'])
    playlist_tracks_titles.append(track['track']['name'])
    # adds a list of all artists involved in the song to the list of artists for the playlist
    artist_list = []
    for artist in track['track']['artists']:
        artist_list.append(artist['name'])
    playlist_tracks_artists.append(artist_list)
    playlist_tracks_first_artists.append(artist_list[0])

Extracting Audio Features of each track

Spotify has a unique attribute of providing information about the features of song available on their platform. There are various features like danceability, energy, tempo, and many more. You can have a look at them here. After fetching these audio features, they will be stored in a dataframe.

features = sp.audio_features(playlist_tracks_id)

import numpy as np
import pandas as pd

features_df = pd.DataFrame(data=features, columns=features[0].keys())

Now the dataframe of audio features must be merged with title and artist information. We will also reorder the dataframe for better accessibility.

features_df['title'] = playlist_tracks_titles
features_df['first_artist'] = playlist_tracks_first_artists
features_df['all_artists'] = playlist_tracks_artists
#features_df = features_df.set_index('id')
features_df = features_df[['id', 'title', 'first_artist', 'all_artists',
                           'danceability', 'energy', 'key', 'loudness',
                           'mode', 'acousticness', 'instrumentalness',
                           'liveness', 'valence', 'tempo',
                           'duration_ms', 'time_signature']]
features_df.tail()

Data Exploration

To understand the scraped data in a better way, we will perform data exploration with the help of visualization.

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

The following count plot will tell which artists have most number of songs in the playlist.

plt.figure(figsize=(20,30))
sns.countplot(features_df['first_artist'])
plt.xticks(rotation=90)

(array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
        34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
        51, 52, 53, 54, 55, 56, 57, 58]),
 <a list of 59 Text xticklabel objects>)

#features_df = features_df.drop(['first_artist', 'all_artists'], axis=1)

Spotify Audio Analysis

Spotify provides the feature where a song can be analyzed on the basis of bars, beats, sections, tatum and segments in the song. You can learn more about these attributes from here.Since beats and tatums represent the same information in scaled form depicted by bars, we will only consider bars, sections and segments.

NOTE – These attributes of each audio track provide in-depth technical information and thus it takes time in processing and fetching. So some time will be spent here in fetching all the information. The output retrying … is the default message displayed by Spotipy for informing the user.

num_bars = []
num_sections = []
num_segments = []

for i in range(0,len(features_df['id'])):
    analysis = sp.audio_analysis(features_df.iloc[i]['id'])
    num_bars.append(len(analysis['bars'])) # beats/time_signature
    num_sections.append(len(analysis['sections']))
    num_segments.append(len(analysis['segments']))

retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...2secs
retrying ...3secs
retrying ...4secs
retrying ...5secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...2secs
retrying ...1secs
retrying ...1secs
retrying ...2secs
retrying ...1secs
retrying ...2secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...2secs
retrying ...3secs
retrying ...4secs
retrying ...5secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...2secs

Visualizing the audio options to learn more about them.

plt.figure(figsize=(16,4))
plt.subplot(1,3,1)
plt.hist(num_bars, bins=20)
plt.xlabel('num_bars')
plt.subplot(1,3,2)
plt.hist(num_sections, bins=20)
plt.xlabel('num_sections')
plt.subplot(1,3,3)
plt.hist(num_segments, bins=20)
plt.xlabel('num_segments')

Text(0.5, 0, 'num_segments')

Now we will add these audio features to our existing dataframe which will enhance our dataset.

features_df['num_bars'] = num_bars
features_df['num_sections'] = num_sections
features_df['num_segments'] = num_segments
features_df.head()

The following code helps in generating .csv file for a playlist. This .csv file which contains the above dataframe information will be stored in the same folder where this jupyter notebook is stored. Remember to add the .csv extension to each file.

Creating Large Dataset

The below csv file will consist of the information of only a single playlist whose index was provided initially, to execute it for multiple playlists, you can use a for loop and can run over each playlist and thus creating a large dataset.

So for all the URI’s provided in the playlists.like_dislike.json file, you will be having a new .csv file with the information of the particular playlist in it. In this way you can build a large dataset of .csv files.

features_df.to_csv("playlist_" + str(playlist_index) + ".csv", encoding='utf-8',index="false")

So we have reached the endpoint of this interactive article on Web Scraping. In this article, we learned how to scrape playlist information of different users with the help of Spotify Web API, known as Spotipy. We fetched each and every audio feature available for the tracks. This article also covered how we can create a dataset of playlists and its tracks information. For exploring more and understanding the deeper details of Spotipy, you can refer to the below-mentioned links.

Web API Spotipy

Spotipy Official Documentation

Also Read – 6 NLP Datasets Beginners should use for their NLP Projects
Also Read – 3 Time Series Data Set with Project Ideas for Machine Learning Beginners

	id	title	first_artist	all_artists	danceability	energy	key	loudness	mode	acousticness	instrumentalness	liveness	valence	tempo	duration_ms	time_signature
94	5IF6IBPqbclVR7SQKmCIyA	Just Like You	Louis Tomlinson	[Louis Tomlinson]	0.703	0.628	0	-5.914	1	0.36400	0.000000	0.384	0.471	138.032	205217	4
95	2UCI5rt3PkM9pXtARihaaQ	Bedroom Floor	Liam Payne	[Liam Payne]	0.625	0.684	1	-7.147	1	0.34200	0.000064	0.107	0.217	119.932	188234	4
96	2kqAtjOtQPAR0OiYUJR43k	Can We Dance	The Vamps	[The Vamps]	0.640	0.820	1	-4.729	0	0.00312	0.000000	0.189	0.583	130.108	192711	4
97	3wGCtxNjZe3GrZfkojZ1FB	I’m a Mess	Jasmine	[Jasmine]	0.636	0.487	11	-7.123	0	0.09460	0.000000	0.111	0.647	97.022	190500	4
98	3jnQF0OxLiAEFFcMdBgJ9s	Oh Cecilia (Breaking My Heart)	The Vamps	[The Vamps]	0.746	0.844	11	-5.506	1	0.03150	0.000000	0.318	0.662	100.027	196684	4

	id	title	first_artist	all_artists	danceability	energy	key	loudness	mode	acousticness	instrumentalness	liveness	valence	tempo	duration_ms	time_signature	num_bars	num_sections	num_segments
0	3h4T9Bg8OVSUYa6danHeH5	Animals	Maroon 5	[Maroon 5]	0.279	0.742	4	-6.460	0	0.000185	0.000000	0.5930	0.328	189.868	231013	4	162	11	921
1	4pbJqGIASGPr0ZpGpnWkDn	We Will Rock You – Remastered	Queen	[Queen]	0.692	0.497	2	-7.316	1	0.676000	0.000000	0.2590	0.475	81.308	122067	4	42	6	353
2	6b3b7lILUJqXcp6w9wNQSm	Cheap Thrills	Sia	[Sia, Sean Paul]	0.592	0.800	6	-4.931	0	0.056100	0.000002	0.0775	0.728	89.972	224813	4	83	8	914
3	2tpWsVSb9UEmDRxAl1zhX1	Counting Stars	OneRepublic	[OneRepublic]	0.664	0.705	1	-4.972	0	0.065400	0.000000	0.1180	0.477	122.016	257267	4	129	8	1001
4	1zB4vmk8tFRmM9UULNzbLB	Thunder	Imagine Dragons	[Imagine Dragons]	0.605	0.822	0	-4.833	1	0.006710	0.134000	0.1470	0.288	167.997	187147	4	128	10	614

Tutorial – How to use Spotipy API to scrape Spotify Data

Introduction

Generating Authorizing Keys for Spotipy

Step 1: Creating Spotify Developers Account

Step 2: Creating a New App

Step 3: Obtaining Client Id and Client Secret Keys

Importing Spotipy library and authorization credentials

Contents of Authorization.json

Contents of Playlists_like_dislike.json

How to get the URI from Spotify Playlist

Getting Information of tracks in Spotify playlist

Fetching details of the track like ID’s, Titles and Artists

Extracting Audio Features of each track

Merging Dataframes for getting audio features and track information

Data Exploration

Spotify Audio Analysis

Creating Large Dataset

Conclusion

Leave a Reply Cancel reply

Related Posts

3 Examples of Monte Carlo Simulation in Python

Decision Tree Regression in Python Sklearn with Example

Random Forest Regression in Python Sklearn with Example

Agglomerative Hierarchical Clustering in Python Sklearn & Scipy

Tutorial for K Means Clustering in Python Sklearn

Sklearn Feature Scaling with StandardScaler, MinMaxScaler, RobustScaler and MaxAbsScaler

Follow US