Exploring Music Through Logistic Regression

Objective

The objective of this project is to expand my music selection through the use of logistic regression. Typically I like to explore music similar to what I listen to but I want to see if I can use machine learning to create a playlist of songs outside the sphere of my normal listening with the potential of me still enjoying the songs.

Metrics for Success: Although accuracy is important for the model, I will determine the success of this project based on two things, music exposure and likability. For exposure, I ideally want to have a playlist entirely of artists new to me in genres which I would not normally listen to. Ideal likability of the playlist would be at least 50% since my intention is to compose the playlist of songs I may or may not like.

The difficult part will be in categorizing songs between: Liked, Might Like, and Don't Like since I can tell what songs I like but songs I dislike is not as clear. In order for me to categorize music, I will have to do some data exploration and define these categories.

About The Data

The data I am using consists of two separate CSV files. The first file I created from accessing the Spotify API and gathering my top 50 tracks from the past 6 months, documentation on this API endpoint can be read here. The second file comes from a Kaggle dataset a user uploaded which contains about 233,000 tracks.

All of the code for this project can be found at this github repository. Check out some of my other work here.

Exploratory Data Analysis

Begin by importing all necessary modules and getting a look at the data.

In [2]:
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import scipy.stats as stat
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import plot_confusion_matrix
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score

%matplotlib inline
Out[5]:
artist name danceability energy key loudness mode speechiness acousticness instrumentalness liveness valence tempo type id uri track_href analysis_url duration_ms time_signature
0 The Stylistics You Make Me Feel Brand New 0.451 0.359 7 -8.567 1 0.0247 0.84700 0.000030 0.1370 0.178 76.015 audio_features 00CCi9yHJh7NjXcNT565Ja spotify:track:00CCi9yHJh7NjXcNT565Ja https://api.spotify.com/v1/tracks/00CCi9yHJh7N... https://api.spotify.com/v1/audio-analysis/00CC... 288440 4
1 Al Green Let's Stay Together 0.756 0.397 7 -10.701 0 0.0522 0.57000 0.011500 0.0542 0.507 101.985 audio_features 63xdwScd1Ai1GigAwQxE8y spotify:track:63xdwScd1Ai1GigAwQxE8y https://api.spotify.com/v1/tracks/63xdwScd1Ai1... https://api.spotify.com/v1/audio-analysis/63xd... 199396 4
2 The Delfonics I Gave to You 0.517 0.372 5 -8.936 1 0.0326 0.77500 0.000466 0.1520 0.479 121.097 audio_features 7ni5SnrJnT1QwZe1rcPqhC spotify:track:7ni5SnrJnT1QwZe1rcPqhC https://api.spotify.com/v1/tracks/7ni5SnrJnT1Q... https://api.spotify.com/v1/audio-analysis/7ni5... 354240 3
3 Khruangbin Maria También 0.748 0.630 6 -7.318 0 0.0460 0.00261 0.859000 0.5320 0.846 103.597 audio_features 66MpvsWO5O7gy9LjXGFw6I spotify:track:66MpvsWO5O7gy9LjXGFw6I https://api.spotify.com/v1/tracks/66MpvsWO5O7g... https://api.spotify.com/v1/audio-analysis/66Mp... 190999 4
4 The Isley Brothers Voyage to Atlantis 0.555 0.369 11 -11.677 0 0.0267 0.10100 0.001530 0.1010 0.472 74.495 audio_features 7r1gEyIhacXnvLriLjmp3c spotify:track:7r1gEyIhacXnvLriLjmp3c https://api.spotify.com/v1/tracks/7r1gEyIhacXn... https://api.spotify.com/v1/audio-analysis/7r1g... 272533 4
Data Types
Out[6]:
artist               object
name                 object
danceability        float64
energy              float64
key                   int64
loudness            float64
mode                  int64
speechiness         float64
acousticness        float64
instrumentalness    float64
liveness            float64
valence             float64
tempo               float64
type                 object
id                   object
uri                  object
track_href           object
analysis_url         object
duration_ms           int64
time_signature        int64
dtype: object
Null Values
Out[7]:
artist              0
name                0
danceability        0
energy              0
key                 0
loudness            0
mode                0
speechiness         0
acousticness        0
instrumentalness    0
liveness            0
valence             0
tempo               0
type                0
id                  0
uri                 0
track_href          0
analysis_url        0
duration_ms         0
time_signature      0
dtype: int64
Out[9]:
danceability energy loudness speechiness acousticness instrumentalness liveness valence tempo duration_ms time_signature
count 50.000000 50.000000 50.00000 50.000000 50.000000 50.000000 50.000000 50.000000 50.000000 50.000000 50.000000
mean 0.608100 0.482280 -10.61920 0.054476 0.349103 0.300275 0.224698 0.629740 111.928560 257191.020000 3.820000
std 0.142698 0.137448 3.55613 0.035916 0.302311 0.385895 0.186462 0.208503 26.317053 78188.585512 0.388088
min 0.309000 0.207000 -21.28500 0.024300 0.000323 0.000000 0.036100 0.166000 74.495000 156653.000000 3.000000
25% 0.492500 0.380750 -12.22625 0.030925 0.052675 0.000090 0.091925 0.495750 87.533750 199039.000000 4.000000
50% 0.616500 0.467500 -10.01050 0.040300 0.323000 0.006070 0.139500 0.607000 111.332500 237333.500000 4.000000
75% 0.705000 0.588500 -7.98525 0.061950 0.585750 0.805000 0.329000 0.794250 128.745750 288079.500000 4.000000
max 0.956000 0.792000 -5.41000 0.190000 0.883000 0.920000 0.777000 0.967000 191.468000 454253.000000 4.000000

Simply observing the descriptive statistics, the first category which caught my eye is instrumentalness due to the apparently large gap between the 50th and 75th percentile.

Out[10]:
<matplotlib.legend.Legend at 0x1141aaf0>

By visualizing the distribution, it appears I largely favor songs which are on the low end in instrumentalness. This will be helpful in defining songs that I like.

Out[11]:
Text(0.5, 1.0, 'Distribution of Speechiness')

This distribution actually doesn't give too much insight since Spotify documentation defines "values below 0.33 most likely represent music and other non-speech-like tracks" and since the entire distribution is below this value, I will use this measurement to help define songs I will like.

Out[12]:
Text(0.5, 1.0, 'Distribution of Danceability')

Spotify documentation defines danceability as a measurement of how suitable a track is for dancing with 0.0 being least danceable and 1.0 being most danceable. This measurement is based on various features such as: tempo, regularity, rhythm stability, and beat strength however it is not clear how each of these affects the final calculation for danceability.

It appears my personal preference for danceability follows a relatively normal distribution so I think standard deviations will be especially useful in defining songs I like, might like, and don't like.

Out[13]:
<matplotlib.legend.Legend at 0x1153f3d0>

The histogram and quantile lines show that I favor songs on the lower end of the tempo range with 75% of the songs being between 75 BPM and just above 125 BPM.

From the visualization, I think the percentile values will be good measurements for defining songs I like.

One other interesting observation I noted is despite tempo being one of the factors in danceability, the distributions are different. It's possible this indicates tempo actually plays a small role in the calculation of danceability.

Out[14]:
<matplotlib.legend.Legend at 0x117b0350>

The distribution for loudness is negatively skewed so I think the first and third quantile values will be most useful in accurately defining my preferences.

According to Spotify, values for loudness typically range between the -60 to 0 decibels so my preference for loudness is on the higher range.

Out[15]:
<matplotlib.legend.Legend at 0x114b3e50>

Spotify defines energy as a "measure of intensity and activity", with typical energetic tracks being fast, loud, and noisy.

As seen from the positively skewed distribution, I tend to prefer songs which are on the lower end of the energy spectrum with 75% of these songs being less than 0.6 on a scale from 0.0 to 1.0.

Despite my preference for loudness being in the higher range, it makes sense that I have a tendency for lower energy songs since energy also accounts for how fast a song is and from the Tempo Distribution it was seen I prefer slower songs.

Out[16]:
<matplotlib.legend.Legend at 0x11d025b0>

Valence is described as a measurement of how positive a song sounds ranging from 0.0 to 1.0, where values closer to 0 represent songs which convey sadness and values closer to 1 represent songs which sound more cheerful. However, Spotify documentation does not give any more details on the actual calculation of this metric.

For my personal preference it looks like I prefer songs which convey more positivity with 75% of the songs being greater than 0.5 and a mean value a little greater than 0.6.

Spotify gives duration in milliseconds however I decided to convert to minutes:seconds and group the songs for better interpretability. At first I noticed a large portion of songs between the 3-minute to 5-minute range but I was skeptical to attribute duration as a determinant for me liking a song.

To test this skepticism I observed the distribution of a much larger sample of songs and it is seen that in terms of duration, my liked songs are distributed similarly to the larger sample.

In addition to observing the different distributions, I conducted a Chi-Squared test with confidence interval of 95% and the Null hypothesis:

H0 : The two distributions are independent from another and I can attribute duration to me liking a song

and Alternative Hypothesis:

H1 : The two distributions are not independent and the distribution of my liked songs follows the larger population

The Chi-Square Test Statistic is: 13.14
The p-value is: 0.0221

Since the p-value is less than .05, I can reject the null hypothesis and confidently ignore duration as a determinant of my liking of a song.

Performing a similar test for key and mode features:

The Chi-Square Test Statistic for Key is: 18.75
The p-value for Key is: 0.0658
The Chi-Square Test Statistic for Mode is: 5.78
The p-value for Mode is: 0.0162 

The p-value is greater than .05 for key and less than .05 for mode, so I will take into account my preference of key but not mode.

Check for Collinearity

Some of the higher correlations among the data are with valence so I will look into those relationships first.

Although the correlation coefficients show moderate correlation there is still a good amount of visible variability indicating these features to not have a linear relationship which would significantly affect the regression model

Looking at loudness versus energy, there is a linear realtionship but again the relationship is not strong enough to be considered collinear.

Another interesting observation is the tempo-danceability relationship seen which confirms my earlier suspicion of tempo's small role in danceability.

Categorizing the Data

Now I will have to categorize the tracks based on what I found from the data.

0 will represent a song I don't like, 1 will represent a song I might like, 2 will represent a song I like

The criteria for a song I like is as follows:

  • Instrumentalness less than 0.25
  • Speechiness less than 0.33
  • Danceability between 0.492 and .705
  • Tempo between 87.5 and 128.75
  • Loudness between -12.22 and -7.98
  • Energy between 0.344 and 0.619
  • Valence between 0.495 and 0.794

The criteria for a song I might like is as follows:

  • Instrumentalness between 0.25 and .805
  • Speechiness between 0.33 and 0.66
  • Danceability between 0.35 and 0.492 or 0.705 and 0.847
  • Tempo between 74 and 87.5 or 128.75 and 138.25
  • Loudness between -15.782 and -12.22 or -7.98 and -6.45
  • Energy between 0.207 and 0.344 or .619 and .792
  • Valence between 0.3 and 0.495 or 0.794 and 0.96

Modeling

After some preprocessing, the data was ready to be trained and tested.The initial logistic regression model gave really bad results, so I did some modification with training iterations and class weights to get the (temporary) final model.

Out[43]:
<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x17b31a30>
Precision for this model is: 0.5917
Recall for this model is: 0.4549

From the confusion matrix as well as the precision and recall scores, this model didn't perform too well. Looking at the matrix and low recall, the model gives a lot of false negatives especially with the Might Like class which is the main target for this project.

Still, I don't want to get too caught up on these metrics in comparison to the results of the actual playlist

Listening to The Music

After training the model and using it to predict a separate dataset, I extracted 50 songs from the Might Like category and created a playlist through Spotify's API

Upon looking at the 50 tracks, there were 9 artists I already knew of and 1 track which I would consider in my active listening

Another interesting thing I noted is 6 of the tracks in the playlist are actually segments from stand-up specials which could indicate the criteria I set for speechiness was too high.

After subtracting the 6 standup segments(which I did enjoy) and the 1 song which I considered in my active listening, the total number of tracks to evaluate went down to 43. Of the 43, I ended up liking 19 of the songs so just a little less than half.

Conclusion

I ended up falling short of the metrics I set, however I still discovered some good music outside of my normal listening.

Some of the causes to the performance not being as expected could be the small sample size used for defining the classes, better tuning for hyperparameters, and even the simple difficulty of classifying subjectivity.

In the future it could be worth exploring other classification algorithms such as such as decision tree or even digging into clustering for this use case.