[Mini ML Project] Predicting Song Likeness from Spotify Playlist

Predicting Likeness for Spotify Songs
Predicting Likeness for Spotify Songs

Introduction

In this article, we will be building a small machine learning project to predict whether user will like a song or not based on the songs present in his Spotify playlist. Here the dataset which will be used can be created using steps used in our previous article on Scraping Spotify data. This dataset in the CSV format consists of all audio and technical information about the tracks of a single playlist.

Also Read – Tutorial – How to use Spotipy API to scrape Spotify Data

So it’s time that we should start off with this tutorial.

Importing Initial Libraries

Initially required libraries are loaded for performing different operations.

In [1]:
import json
import numpy as np
import pandas as pd

In the below code, a JSON file is loaded. This file contains the URI(Uniform resource indicator) of the Spotify playlists. Information about these playlists is fetched using the URI. Along with this, like attribute is used to denote whether user likes the playlist or not. These details have already been covered in the previous article, where we created a dataset. You can look at the article from here.

Ad
Deep Learning Specialization on Coursera
In [2]:
playlists = json.load(open('playlists_like_dislike.json'))
playlist_uri = playlists[playlist_index]['uri']
like = playlists[playlist_index]['like']

Loading dataset

As already mentioned, you can build as large dataset as you want. Here in this article, I will be using two CSV files, where one of them contains the list of audio tracks that I like and another CSV file contains the list of songs that I dislike. Now we will be loading these CSV files in our jupyter notebook for further processing.

In [3]:
like_songs = pd.read_csv("playlist_0.csv",index_col=[0])

like_songs.head()
Out[3]:
id title first_artist all_artists danceability energy key loudness mode acousticness instrumentalness liveness valence tempo duration_ms time_signature num_bars num_sections num_segments
0 3h4T9Bg8OVSUYa6danHeH5 Animals Maroon 5 [‘Maroon 5’] 0.279 0.742 4 -6.460 0 0.000185 0.000000 0.5930 0.328 189.868 231013 4 162 11 921
1 4pbJqGIASGPr0ZpGpnWkDn We Will Rock You – Remastered Queen [‘Queen’] 0.692 0.497 2 -7.316 1 0.676000 0.000000 0.2590 0.475 81.308 122067 4 42 6 353
2 6b3b7lILUJqXcp6w9wNQSm Cheap Thrills Sia [‘Sia’, ‘Sean Paul’] 0.592 0.800 6 -4.931 0 0.056100 0.000002 0.0775 0.728 89.972 224813 4 83 8 914
3 2tpWsVSb9UEmDRxAl1zhX1 Counting Stars OneRepublic [‘OneRepublic’] 0.664 0.705 1 -4.972 0 0.065400 0.000000 0.1180 0.477 122.016 257267 4 129 8 1001
4 1zB4vmk8tFRmM9UULNzbLB Thunder Imagine Dragons [‘Imagine Dragons’] 0.605 0.822 0 -4.833 1 0.006710 0.134000 0.1470 0.288 167.997 187147 4 128 10 614
In [4]:
dislike_songs = pd.read_csv("playlist_1.csv",index_col=[0])

dislike_songs.head()
Out[4]:
id title first_artist all_artists danceability energy key loudness mode acousticness instrumentalness liveness valence tempo duration_ms time_signature num_bars num_sections num_segments
0 7oRA6vzbUl5brLK7GDcKOJ Fikar Not (From “Chhichhore”) Nakash Aziz [‘Nakash Aziz’, ‘Dev Negi’, ‘Amit Mishra’, ‘Am… 0.608 0.848 5 -6.826 0 0.3750 0.001800 0.0580 0.869 185.884 189073 3 191 10 903
1 5cgKosPPj5Cs9a2JQufUc1 Ilahi Arijit Singh [‘Arijit Singh’] 0.594 0.967 9 -5.767 1 0.1660 0.000025 0.1050 0.452 132.009 228982 4 124 11 899
2 5fXslGZPI5Cco6PKHzlSL3 Illegal Weapon 2.0 (From “Street Dancer 3D”) Jasmine Sandlas [‘Jasmine Sandlas’, ‘Garry Sandhu’, ‘Tanishk B… 0.805 0.919 1 -1.294 1 0.1010 0.003430 0.0598 0.494 94.993 188606 4 72 9 895
3 06wTXKpDMrSp5OfB7MErpx Befikra Meet Bros. [‘Meet Bros.’, ‘Aditi Singh Sharma’] 0.600 0.979 10 -3.513 1 0.1380 0.000000 0.1000 0.453 137.064 351579 4 200 20 1443
4 6gbZvxPMHrpIA8RAscDO9D Jigra Shashwat Sachdev [‘Shashwat Sachdev’, ‘Siddharth Basrur’] 0.712 0.655 8 -7.813 1 0.0694 0.000000 0.1030 0.145 99.978 240000 4 98 10 1017

Classification of tracks for identifying songs likeness

As we are looking to classify songs on the basis of user’s liking, therefore we need to classify them individually for training purposes.

For this the playlist which had like attribute as true in the playlists_like_dislike.json file, all the songs of that playlist will be classified as liked songs and when like attribute is false, then all the songs will be classified as disliked..

Using the below code, the song which user likes is denoted through ‘1’ and song which is disliked is denoted through ‘0’. The song_like attribute will be added as a column in our dataframe.

The values of song_like column will help in training and building of the machine learning model, which will be subsequently used for predicting whether a user will like the songs of a playlist.

In [5]:
like_songs['song_like'] = np.ones((len(like_songs), 1), dtype=int)

like_songs.head()
Out[5]:
id title first_artist all_artists danceability energy key loudness mode acousticness instrumentalness liveness valence tempo duration_ms time_signature num_bars num_sections num_segments song_like
0 3h4T9Bg8OVSUYa6danHeH5 Animals Maroon 5 [‘Maroon 5’] 0.279 0.742 4 -6.460 0 0.000185 0.000000 0.5930 0.328 189.868 231013 4 162 11 921 1
1 4pbJqGIASGPr0ZpGpnWkDn We Will Rock You – Remastered Queen [‘Queen’] 0.692 0.497 2 -7.316 1 0.676000 0.000000 0.2590 0.475 81.308 122067 4 42 6 353 1
2 6b3b7lILUJqXcp6w9wNQSm Cheap Thrills Sia [‘Sia’, ‘Sean Paul’] 0.592 0.800 6 -4.931 0 0.056100 0.000002 0.0775 0.728 89.972 224813 4 83 8 914 1
3 2tpWsVSb9UEmDRxAl1zhX1 Counting Stars OneRepublic [‘OneRepublic’] 0.664 0.705 1 -4.972 0 0.065400 0.000000 0.1180 0.477 122.016 257267 4 129 8 1001 1
4 1zB4vmk8tFRmM9UULNzbLB Thunder Imagine Dragons [‘Imagine Dragons’] 0.605 0.822 0 -4.833 1 0.006710 0.134000 0.1470 0.288 167.997 187147 4 128 10 614 1
In [6]:
dislike_songs['song_like'] = np.zeros((len(dislike_songs), 1), dtype=int)

dislike_songs.head()
Out[6]:
id title first_artist all_artists danceability energy key loudness mode acousticness instrumentalness liveness valence tempo duration_ms time_signature num_bars num_sections num_segments song_like
0 7oRA6vzbUl5brLK7GDcKOJ Fikar Not (From “Chhichhore”) Nakash Aziz [‘Nakash Aziz’, ‘Dev Negi’, ‘Amit Mishra’, ‘Am… 0.608 0.848 5 -6.826 0 0.3750 0.001800 0.0580 0.869 185.884 189073 3 191 10 903 0
1 5cgKosPPj5Cs9a2JQufUc1 Ilahi Arijit Singh [‘Arijit Singh’] 0.594 0.967 9 -5.767 1 0.1660 0.000025 0.1050 0.452 132.009 228982 4 124 11 899 0
2 5fXslGZPI5Cco6PKHzlSL3 Illegal Weapon 2.0 (From “Street Dancer 3D”) Jasmine Sandlas [‘Jasmine Sandlas’, ‘Garry Sandhu’, ‘Tanishk B… 0.805 0.919 1 -1.294 1 0.1010 0.003430 0.0598 0.494 94.993 188606 4 72 9 895 0
3 06wTXKpDMrSp5OfB7MErpx Befikra Meet Bros. [‘Meet Bros.’, ‘Aditi Singh Sharma’] 0.600 0.979 10 -3.513 1 0.1380 0.000000 0.1000 0.453 137.064 351579 4 200 20 1443 0
4 6gbZvxPMHrpIA8RAscDO9D Jigra Shashwat Sachdev [‘Shashwat Sachdev’, ‘Siddharth Basrur’] 0.712 0.655 8 -7.813 1 0.0694 0.000000 0.1030 0.145 99.978 240000 4 98 10 1017 0

As you would scroll sideways in both the dataframes shown above, you would find song_like column.

We can also check the number of rows and columns present in a dataframe. Since we have to operate over these two different dataframes, it is better if we concatenate these two dataframes into a single dataframe. I have used append function for this.

In [7]:
like_songs.shape
Out[7]:
(99, 20)
In [8]:
dislike_songs.shape
Out[8]:
(75, 20)
In [9]:
songs = like_songs.append(dislike_songs,ignore_index=False)

songs.tail()
Out[9]:
id title first_artist all_artists danceability energy key loudness mode acousticness instrumentalness liveness valence tempo duration_ms time_signature num_bars num_sections num_segments song_like
70 4eu27jAU2bbnyHUC3G75U8 Badtameez Dil Benny Dayal [‘Benny Dayal’, ‘Shefali Alvares’] 0.805 0.932 2 -2.228 0 0.24000 0.00000 0.2160 0.792 106.019 252760 4 109 10 1133 0
71 7oDykOViQGiV9M3enF7u4Y La La La Neha Kakkar [‘Neha Kakkar’, ‘Arjun Kanungo’] 0.774 0.781 10 -4.426 1 0.34100 0.00000 0.1110 0.469 100.017 192000 4 76 7 745 0
72 79JMjG6tj2zvCDEukCSgcx Laung Gawacha Nucleya [‘Nucleya’, ‘Avneet Khurmi’] 0.571 0.900 0 -4.855 0 0.00502 0.00916 0.0952 0.414 91.985 213913 4 81 8 1039 0
73 5T3rp70MEW4XnWv82BDVey Nikle Currant Jassie Gill [‘Jassie Gill’, ‘Neha Kakkar’, ‘Sukh-E Muzical… 0.811 0.921 5 -3.152 0 0.18200 0.00000 0.8230 0.756 94.980 212925 4 84 13 980 0
74 01X09TTUbyJWQPlv28gUss Prada Jass Manak [‘Jass Manak’] 0.721 0.650 9 -5.426 0 0.03610 0.00000 0.3580 0.502 156.008 182115 4 114 8 715 0

As it can be seen, the songs dataframe has 174 rows i.e. rows of first and second CSV files are successfully added.

In [10]:
songs.shape
Out[10]:
(174, 20)

Sometimes there can be a scenario where duplicate values are present in the dataframe, to remove the duplicate rows, we will be using the duplicate function.

As it can be seen, the row count has been same. Thus there are no duplicate values.

In [11]:
songs = songs.drop_duplicates()

songs.shape
Out[11]:
(174, 20)

In this project where we are predicting whether user will like a song or not, so for this purpose we don’t require the information contained in columns like id, first_artist and all_artists, therefore we will drop these columns from the dataframe.

In [12]:
songs = songs.drop(['id','first_artist','all_artists'], axis = 1)
In [13]:
songs.head()
Out[13]:
title danceability energy key loudness mode acousticness instrumentalness liveness valence tempo duration_ms time_signature num_bars num_sections num_segments song_like
0 Animals 0.279 0.742 4 -6.460 0 0.000185 0.000000 0.5930 0.328 189.868 231013 4 162 11 921 1
1 We Will Rock You – Remastered 0.692 0.497 2 -7.316 1 0.676000 0.000000 0.2590 0.475 81.308 122067 4 42 6 353 1
2 Cheap Thrills 0.592 0.800 6 -4.931 0 0.056100 0.000002 0.0775 0.728 89.972 224813 4 83 8 914 1
3 Counting Stars 0.664 0.705 1 -4.972 0 0.065400 0.000000 0.1180 0.477 122.016 257267 4 129 8 1001 1
4 Thunder 0.605 0.822 0 -4.833 1 0.006710 0.134000 0.1470 0.288 167.997 187147 4 128 10 614 1

Now we will be creating another dataframe where we will drop the title column as well. This is because while building a machine learning model, we don’t want our model to encounter values that are in the form of string, thus we are dropping it.

We didn’t dropped title column in the previous dataframe because we will require the title attribute to showcase our results.

In [14]:
prediction = songs.drop(['title'], axis = 1)
In [15]:
prediction.head()
Out[15]:
danceability energy key loudness mode acousticness instrumentalness liveness valence tempo duration_ms time_signature num_bars num_sections num_segments song_like pred_class
0 0.279 0.742 4 -6.460 0 0.000185 0.000000 0.5930 0.328 189.868 231013 4 162 11 921 1 1
1 0.692 0.497 2 -7.316 1 0.676000 0.000000 0.2590 0.475 81.308 122067 4 42 6 353 1 1
2 0.592 0.800 6 -4.931 0 0.056100 0.000002 0.0775 0.728 89.972 224813 4 83 8 914 1 0
3 0.664 0.705 1 -4.972 0 0.065400 0.000000 0.1180 0.477 122.016 257267 4 129 8 1001 1 0
4 0.605 0.822 0 -4.833 1 0.006710 0.134000 0.1470 0.288 167.997 187147 4 128 10 614 1 1

Exploring data through visualization

As a part of machine learning model building process, we must get familiar with our data. For this purpose, we will perform data exploration through visualizing various attributes present in the dataset.

In [16]:
import matplotlib.pyplot as plt
import seaborn as sns

Distribution Plots

With the help of distribution plots, we will analyze both the songs which are liked and disliked by the user. Here four different attributes i.e. danceability, energy, key, and loudness are depicted. It is evident from the plots that danceability and energy are almost similar for both sets of songs. In terms of keys, there are various values and songs that are liked, have more loudness as compared to songs disliked.

In [17]:
plt.figure(figsize=(16,16))
plt.subplot(4,4,1)
sns.distplot(songs[songs['song_like']==1]['danceability'], color='red', bins=40)
sns.distplot(songs[songs['song_like']==0]['danceability'], color='blue', bins=40)
plt.subplot(4,4,2)
sns.distplot(songs[songs['song_like']==1]['energy'], color='red', bins=40)
sns.distplot(songs[songs['song_like']==0]['energy'], color='blue', bins=40)
plt.subplot(4,4,3)
sns.distplot(songs[songs['song_like']==1]['key'], color='red', bins=40)
sns.distplot(songs[songs['song_like']==0]['key'], color='blue', bins=40)
plt.subplot(4,4,4)
sns.distplot(songs[songs['song_like']==1]['loudness'], color='red', bins=40)
sns.distplot(songs[songs['song_like']==0]['loudness'], color='blue', bins=40)
plt.legend((1,0))
Out[17]:
<matplotlib.legend.Legend at 0x1c407e71fd0>

Similarly, various other attributes like mode, acousticness, tempo and technical audio analysis features like bars, segments and sections are visualized.

In [18]:
plt.figure(figsize=(16,16))
plt.subplot(4,4,1)
sns.distplot(songs[songs['song_like']==1]['mode'], color='red', bins=40)
sns.distplot(songs[songs['song_like']==0]['mode'], color='blue', bins=40)
plt.subplot(4,4,2)
sns.distplot(songs[songs['song_like']==1]['acousticness'], color='red', bins=40)
sns.distplot(songs[songs['song_like']==0]['acousticness'], color='blue', bins=40)
plt.subplot(4,4,3)
sns.distplot(songs[songs['song_like']==1]['instrumentalness'], color='red', bins=40)
sns.distplot(songs[songs['song_like']==0]['instrumentalness'], color='blue', bins=40)
plt.subplot(4,4,4)
sns.distplot(songs[songs['song_like']==1]['liveness'], color='red', bins=40)
sns.distplot(songs[songs['song_like']==0]['liveness'], color='blue', bins=40)
plt.legend((1,0))
Out[18]:
<matplotlib.legend.Legend at 0x1c40831a4a8>
In [19]:
plt.figure(figsize=(16,16))
plt.subplot(4,4,1)
sns.distplot(songs[songs['song_like']==1]['valence'], color='red', bins=40)
sns.distplot(songs[songs['song_like']==0]['valence'], color='blue', bins=40)
plt.subplot(4,4,2)
sns.distplot(songs[songs['song_like']==1]['tempo'], color='red', bins=40)
sns.distplot(songs[songs['song_like']==0]['tempo'], color='blue', bins=40)
plt.subplot(4,4,3)
sns.distplot(songs[songs['song_like']==1]['duration_ms'], color='red', bins=40)
sns.distplot(songs[songs['song_like']==0]['duration_ms'], color='blue', bins=40)
plt.subplot(4,4,4)
sns.distplot(songs[songs['song_like']==1]['time_signature'], color='red', bins=40)
sns.distplot(songs[songs['song_like']==0]['time_signature'], color='blue', bins=40)
plt.legend((1,0))
Out[19]:
<matplotlib.legend.Legend at 0x1c40865d438>
In [20]:
plt.figure(figsize=(16,16))
plt.subplot(4,4,1)
sns.distplot(songs[songs['song_like']==1]['num_bars'], color='red', bins=40)
sns.distplot(songs[songs['song_like']==0]['num_bars'], color='blue', bins=40)
plt.subplot(4,4,2)
sns.distplot(songs[songs['song_like']==1]['num_sections'], color='red', bins=40)
sns.distplot(songs[songs['song_like']==0]['num_sections'], color='blue', bins=40)
plt.subplot(4,4,3)
sns.distplot(songs[songs['song_like']==1]['num_segments'], color='red', bins=40)
sns.distplot(songs[songs['song_like']==0]['num_segments'], color='blue', bins=40)
plt.legend((1,0))
Out[20]:
<matplotlib.legend.Legend at 0x1c40819f240>

Building Machine Learning Model

Here we will be exploring a couple of different machine learning classifiers and the one which performs better will be used for prediction purposes. The first classifier is Logistic Regression.

Logistic Regression

Below four new dataframes are created for building the classifier.
In [21]:
X_train = prediction.drop('song_like', axis=1)
X_test = songs.drop(['song_like','title'], axis=1)
y_train = prediction['song_like']
y_test = songs['song_like']

We can see how these four dataframes look and the information stored in it.

In [21]:
X_train.head()
Out[21]:
danceability energy key loudness mode acousticness instrumentalness liveness valence tempo duration_ms time_signature num_bars num_sections num_segments pred_class
0 0.279 0.742 4 -6.460 0 0.000185 0.000000 0.5930 0.328 189.868 231013 4 162 11 921 1
1 0.692 0.497 2 -7.316 1 0.676000 0.000000 0.2590 0.475 81.308 122067 4 42 6 353 1
2 0.592 0.800 6 -4.931 0 0.056100 0.000002 0.0775 0.728 89.972 224813 4 83 8 914 0
3 0.664 0.705 1 -4.972 0 0.065400 0.000000 0.1180 0.477 122.016 257267 4 129 8 1001 0
4 0.605 0.822 0 -4.833 1 0.006710 0.134000 0.1470 0.288 167.997 187147 4 128 10 614 1
In [22]:
X_test.head()
Out[22]:
danceability energy key loudness mode acousticness instrumentalness liveness valence tempo duration_ms time_signature num_bars num_sections num_segments pred_class
0 0.279 0.742 4 -6.460 0 0.000185 0.000000 0.5930 0.328 189.868 231013 4 162 11 921 1
1 0.692 0.497 2 -7.316 1 0.676000 0.000000 0.2590 0.475 81.308 122067 4 42 6 353 1
2 0.592 0.800 6 -4.931 0 0.056100 0.000002 0.0775 0.728 89.972 224813 4 83 8 914 0
3 0.664 0.705 1 -4.972 0 0.065400 0.000000 0.1180 0.477 122.016 257267 4 129 8 1001 0
4 0.605 0.822 0 -4.833 1 0.006710 0.134000 0.1470 0.288 167.997 187147 4 128 10 614 1
In [23]:
y_train.tail()
Out[23]:
70    0
71    0
72    0
73    0
74    0
Name: song_like, dtype: int32
In [24]:
y_test.head()
Out[24]:
0    1
1    1
2    1
3    1
4    1
Name: song_like, dtype: int32

Now scikit-learn library is loaded, Logistic Regression is imported for building logistic regression classifier.

In [25]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.linear_model import LogisticRegression

Using the code shown below, model is created and fitted over the data.

In [26]:
lr_model = LogisticRegression()
lr_model.fit(X_train, y_train)
H:\Anaconda\lib\site-packages\sklearn\linear_model\logistic.py:433: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
  FutureWarning)
Out[26]:
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=None, solver='warn',
          tol=0.0001, verbose=0, warm_start=False)

Finally, the model is checked to analyze its performance with the help of a confusion matrix and classification report. These two features help in calculating the accuracy of machine learning models.

Here Precision parameter is ratio of correctly predicted positive observations to the total predicted positive observations and the Recall parameter is ratio of correctly predicted positive observations to all observations. F1 Score is the weighted average of Precision and Recall.

In [27]:
lr_pred = lr_model.predict(X_test)
print(confusion_matrix(y_test, lr_pred))
print('\n')
print(classification_report(y_test, lr_pred))
[[46 29]
 [15 84]]


              precision    recall  f1-score   support

           0       0.75      0.61      0.68        75
           1       0.74      0.85      0.79        99

   micro avg       0.75      0.75      0.75       174
   macro avg       0.75      0.73      0.73       174
weighted avg       0.75      0.75      0.74       174

The weighted avg value in the f1-score determines the accuracy of the model. Therefore, the accuracy(ratio of correctly predicted observation to the total observations) of the logistic regression model is 74%.

K-Nearest Neighbor Machine Learning Model

K-Nearest Neighbors classifier is imported from scikit-learn. After this, the classifier is trained by fitting the model over the data.

In [28]:
from sklearn.neighbors import KNeighborsClassifier 
In [29]:
knn_model = KNeighborsClassifier() 
  
knn_model.fit(X_train, y_train) 
Out[29]:
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=None, n_neighbors=5, p=2,
           weights='uniform')
In [29]:
knn_pred = knn_model.predict(X_test)
print(confusion_matrix(y_test, knn_pred))
print('\n')
print(classification_report(y_test, knn_pred))
[[41 34]
 [15 84]]


              precision    recall  f1-score   support

           0       0.73      0.55      0.63        75
           1       0.71      0.85      0.77        99

   micro avg       0.72      0.72      0.72       174
   macro avg       0.72      0.70      0.70       174
weighted avg       0.72      0.72      0.71       174

As mentioned before, the weighted avg value of f1-score column will help in finding the accuracy of the model. So here the accuracy is 71%.

Thus, we will be using the logistic regression model for prediction purposes.

NOTE – You can try different machine learning models for achieving higher accuracy and for learning purposes as well.

In [30]:
songs['prediction'] = lr_pred

Logistic Regression Model is used for predicting likes and dislikes of user.

In [31]:
songs.sort_values('title').head()
Out[31]:
title danceability energy key loudness mode acousticness instrumentalness liveness valence tempo duration_ms time_signature num_bars num_sections num_segments song_like pred_class prediction
81 22 0.661 0.729 7 -6.561 1 0.002150 0.00130 0.0477 0.668 103.987 232120 4 98 9 837 1 1 1
42 All We Know 0.662 0.586 0 -8.821 1 0.097000 0.00272 0.1150 0.296 90.000 194080 4 72 6 673 1 1 1
69 All of Me 0.422 0.264 8 -7.064 1 0.922000 0.00000 0.1320 0.331 119.930 269560 4 135 13 743 1 1 1
9 Alone 0.631 0.953 2 -3.739 1 0.024100 0.01550 0.1080 0.422 141.990 273803 4 161 16 1038 1 0 0
0 Animals 0.279 0.742 4 -6.460 0 0.000185 0.00000 0.5930 0.328 189.868 231013 4 162 11 921 1 1 1
In [32]:
final_prediction = songs[['title','song_like','prediction']]

Here we can see the prediction made by the Logistic Regression Model. So we have achieved our aim of this project.

In [33]:
final_prediction
Out[33]:
title song_like prediction
0 Animals 1 1
1 We Will Rock You – Remastered 1 1
2 Cheap Thrills 1 0
3 Counting Stars 1 0
4 Thunder 1 1
5 Believer 1 1
6 Titanium 1 1
7 Shot Me Down (Bang Bang My Baby Remix) 1 0
8 Nashe Si Chadh Gayi 1 0
9 Alone 1 0
10 Señorita 1 1
11 Bilionera – Radio Edit 1 1
12 Perfect 1 1
13 Memories 1 1
14 Wolves 1 1
15 Attention 1 1
16 Havana 1 1
17 Despacito – Remix 1 0
18 Mi Gente 1 0
19 Gasolina 1 0
20 In the End – Live at Live 8, Benjamin Franklin… 1 1
21 Leave Out All The Rest 1 1
22 Cold Water (feat. Justin Bieber & MØ) 1 1
23 Let Me Love You 1 1
24 Don’t Wanna Know 1 0
25 Drag Me Down 1 1
26 Steal My Girl 1 1
27 Love Me Like You Do – From “Fifty Shades Of Grey” 1 1
28 Finesse – James Hype Remix; feat. Cardi B 1 0
29 Cheap Thrills 1 1
45 Makhna 0 1
46 Chal Ghar Chalen (From “Malang – Unleash The M… 0 0
47 The Wakhra Song 0 0
48 Don’t Be Shy Again (From “Bala”) 0 1
49 Slow Motion (From “Bharat”) 0 0
50 Duniyaa (From “Luka Chuppi”) 0 1
51 Naah Goriye (From “Bala”) 0 0
52 Kaise Hua (From “Kabir Singh”) 0 1
53 Dil Chori (From “Sonu Ke Titu Ki Sweety”) 0 0
54 Dheeme Dheeme (From “Pati Patni Aur Woh”) 0 0
55 Bekhayali (From “Kabir Singh”) 0 0
56 Kamariya 0 1
57 Prada 0 0
58 Lamberghini 0 0
59 O Saki Saki (From “Batla House”) 0 1
60 Chogada (From “Loveyatri”) 0 0
61 Teri Mitti 0 1
62 Naah 0 0
63 Filhall 0 1
64 Urvashi 0 1
65 Chak De India 0 0
66 Kaun Tujhe (From “M.S.Dhoni – The Untold Story”) 0 1
67 Bhaag Milkha Bhaag – Rock Version 0 0
68 Dheeme Dheeme 0 1
69 Cutiepie 0 0
70 Badtameez Dil 0 0
71 La La La 0 1
72 Laung Gawacha 0 0
73 Nikle Currant 0 0
74 Prada 0 1

174 rows × 3 columns

Conclusion

Concluding the article, we learned how we can predict which songs users will like based on the playlist listed on Spotify with the help of the Machine Learning Classifier. We covered the steps for scraping data from Spotify using the Web API of Spotify i.e. Spotipy. Along with this, Data Visualization was covered in the article. Lastly, we got to know how we can evaluate the performance of a machine learning classifer with the help of confusion matrix and classification report.

 

Like and Comment section (Community Members)

Create Your ML Profile!

Don't miss out to join exclusive Machine Learning community

Comments

No comments yet