When Will Igorrs Other Playlists Be on Spotify Again

Introduction

My music choices and preferences take ever been a direct indicator of my electric current activity, mood, and emotional state. I listen to upbeat hip-hop songs while working out, soft popular music while I'k feeling moody or downward, or something in between while working on my data science projects. This project is an attempt to use Motorcar Learning to identify those moods and build respective playlists for each.

If you lot're interested in my code, you can find it here!

Project Outline

This project has iv distinct parts:

Obtain Spotify Data: Use the Spotify API to obtain all my music listening data from the past yr.
Analyze Music Gustation: Take a deep dive into my vocal, artist, and anthology preferences and how these preferences have fluctuated over the year.
Predict Different Moods: Analyse sound features such equally the acousticness , tempo , and instrumentalness of my song preferences and utilise the K-Means Clustering algorithim to stratify the music into different, divers moods.
Create Playlists: Develop custom mood-specific Spotify playlists based on my music preferences and mood clusters from the last two parts.

Obtaining Spotify Data

The first step is to obtain the information that will bulldoze the rest of this project. This can be done with the following steps:

Visit https://world wide web.spotify.com/us/account/privacy/ -- log into Spotify account and scroll to the bottom and request data.
Receive a downloadable cipher file with listening data from Spotify's squad in around 1-3 days
Use python's Spotipy package to access the Spotify API and load data into a Pandas Dataframe
Acquire Spotify's sound characteristic data for all my songs:

                          # Create A Information Frame with Unique Song ID and corresponding song features              my_feature              =              pd.DataFrame(columns=              [              "song_id"              ,              "energy"              ,              "tempo"              ,              "speechiness"              ,              "acousticness"              ,              "instrumentalness"              ,              "danceability"              ,              "loudness"              ,              "valence"              ]              )              # For each Vocal ID in my Spotify-provided listening history,              # import spotify'due south sound features in DataFrame              for              song              in              songid:              features              =              sp.audio_features(tracks              =              [song]              )              [              0              ]              if              features              is              not              None              :              my_feature              =              my_feature.append(              {              "song_id"              :vocal,              "energy"              :features[              'energy'              ]              ,              "tempo"              :features[              'tempo'              ]              ,              "speechiness"              :features[              'speechiness'              ]              ,              "acousticness"              :features[              'acousticness'              ]              ,              "instrumentalness"              :features[              'instrumentalness'              ]              ,              "danceability"              :features[              'danceability'              ]              ,              "loudness"              :features[              'loudness'              ]              ,              "valence"              :features[              'valence'              ]              ,              }              ,ignore_index=              True              )

The Spotify API audio features are fundamental to my analysis and mood clustering of my music preferences. Attributes similar danceability and free energy capture differences between faster and slower paced music, while speechiness quantifies each song's focus on words. A high valence indicates positive/happy /euphoric music while depression valence quantifies nighttime/aroused/distressing music. A complete list of attributes and corresponding definitions can be found hither.

My Music Taste Analysis

To best brandish and understand my information, I utilized the Plotly package for interactive plotting. Permit's take a look at some of my music from the past year.

Top Songs

After Hours, by the Weeknd, tops the chart of my most listened to songs of the past yr! Even though the song didn't come out till February, I clearly had it on repeat essentially through April. An interesting note about the song is that information technology is a half dozen+ minute song, much longer than the average song I listen to (3ish min.) and the measurement variable of "sum minutes listened" rather than "count of times listened" probably worked in its favor.

Acme Artists

Post Malone, whom my dad lovingly refers to as, "Mail service Office Malone", came out equally my summit artist of the past yr. There was a pretty abiding stream of various Post songs over my concluding 12 months. In fact, all three of his albums show upwardly on my top twenty albums listing of the past year!

Top Albums

The Weeknd's After Hours album tops the chart of pinnacle albums of the by year, which makes a lot of sense thinking back to all the days of cycling through the album on echo while messing effectually in my Jupyter Notebooks. Recently, you tin can see that Polo G'due south The Caprine animal has been occupying much of the Jupyter Notebook-ing time.

Using K-Means clustering to predict my dissimilar music-listening moods

To create singled-out classes to grouping my musical moods, I utilise the K-Means unsupervised learning algorthim.

The K-Means clustering algorithim is one of the about popular (and perhaps most intuitive) machine learning algorithims for classifying unlabeled data. It works in the following steps:

Cull number of centroids
Initalize the chosen number of centroids at random, or at specified data points.
Calculate the Euclidian Distance betwixt each signal and each centroid, and label each point by it closest centroid.
For each labeled group, the average point is calculated. This boilerplate point becomes the new centroid for the grouping
Step (3-4) occurs iteratively until the dataset converges (minimal points switching classes during step 3)

Repeating steps 3-5 until convergence on a 3-cluster, 2d dataset:

Cull the number of clusters

This algorithim works very well, on the basis of choosing a number of centroids that represents the data. In that location are a few methods to choosing a number of clusters, and in this project I use the elbow method, choosing the number of clusters at the knee joint of the graph of clusters x inertia (sum of squared altitude)

I run the sklearn K-Means algorithim on my information set and ultimately choose 4 clusters based on the graph

Clusters ten Inertia

Preprocess the Information

Now that I have an algorithim and process downward, I begin the preprocessing of my data! To capture my true preferences, as opposed to songs I listened to for a minute and never over again, I filter downwardly my data to songs that I have listened to for more than 15 minutes in the past year.

Dataframe Snapshot

Song	Creative person	Album	Energy	Tempo	Speechiness	Acousticness	Instrumentalness	Danceability	Loudness	Valence
No Pick	Postal service Malone	Stoney (Palatial)	0.734	0.238	0.057	0.075	0.000	0.575	0.805	0.494
My Bad	Khalid	Complimentary Spirit	0.568	0.244	0.082	0.543	0.266	0.645	0.620	0.391
The Show Goes On	Lupe Fiasco	Lasers	0.889	0.606	0.115	0.018	0.000	0.591	0.855	0.650
Escape	Kehlani	SweetSexySavage	0.688	0.213	0.079	0.039	0.000	0.562	0.787	0.707

Data Distributions

The first affair I noticed about each feature is that the distributions are all a flake different. Let'southward visualize this:

A few impressions of the information:

Instrumentalness is heavily skewed with almost just low values (More lyrical music and electronic beats, less jazz, rock, or heavy metal)
Speechiness is also skewed toward lower values. More music, less podcasts, speeches, and other spoken give-and-take.
Energy , danceability , and loudness features all have inherently similar, slightly skewed distributions
Valence and tempo features seems to be the most evenly distributed from 0 to 1.

Run the Algorithim

The data is ready to be fed through algorithim to develop the mood clusters!

                          # Drop not-numeric columns              # Convert Dataframe to a numerical Numpy assortment              X              =              df.drop(              [              'track_name'              ,              'artist_name'              ,              'album'              ]              ,centrality=              1              )              .values              # Fit data to 4 clusters using the sklearn Chiliad-Means algorithim              from              sklearn.cluster              import              KMeans kmeans              =              KMeans(n_clusters=              4              ,n_jobs=              -              one              )              .fit(X)              # Predict and label mood for each vocal in Dataframe              kmeans_mood_labels              =              kmeans.predict(X)              df[              'characterization'              ]              =              kmeans_mood_labels

Visualize the Results

My algorithim produced the following cluster value counts:

Mood Group Counts

PCA - 2nd

Each song in my dataframe has 8 audio features, which essentially means the data is eight-dimensional. Because data cannot past visualized in viii dimensions, I used a common dimensionality reduction technique called Principal Component Analysis (PCA) to condense my information into 2 dimensions in a manner that maintains as much of the original data's variance as possible.

Using sklearn'southward PCA bundle:

                          from              sklearn.decomposition              import              PCA pca_2d              =              PCA(n_components=              2              )              principal_components              =              pca_2d.fit_transform(X)              pc_2d              =              pd.DataFrame(principal_components)              pc_2d.columns              =              [              'ten'              ,              'y'              ,              'label'              ]

(It is important to annotation that (x,y) coordinates plotted are a transformed representation of a combination of my 8 features, but do not exhibit whatever direct intuition into the characteristic values)

By using PCA, I tin can see my data much more conspicuously. Every bit shown in the plot, labels "2" and "3" are pretty well divers with not too much overlap, while labels "0" and "i" accept a large amount of overlap.

            Input:              print              (pca_2d.explained_variance_ratio_              ,              sum              (pca_2d.explained_variance_ratio_)              )

            Output:              array(              [              0.32387322              ,              0.22293707              ]              )              ,              0.5468

Using the in a higher place PCA attribute, I come across that 32% of my data's original variance was explained past the 1st component while 22% is explained past the 2nd. Altogether, my PCA reduced the dimensionality the 8 features while maintaining around 54% of the original variance.

PCA - 3D

Next, I reran my PCA role, this fourth dimension with three components, using Plotly's 3D Scatter Plot functionality to display my data. The cluster stratifications are much clearer now, with the tertiary component capturing an extra 17.5% of the original data's variance. This 3-component transformation of the 8 feature data exhibits a much clear visual cluster distinction while maintaining more than than 77% of the original variance.

            pca_3d              =              PCA(n_components=              iii              )              principal_components              =              pca_3d.fit_transform(X)              pc_3d              =              pd.DataFrame(principal_components)              pc_3d.columns              =              [              'x'              ,              'y'              ,              'z'              ,              'label'              ]

            Input:              print              (pca_3d.explained_variance_ratio_              ,              sum              (pca_3d.explained_variance_ratio_)              )

            Output:              assortment(              [              0.32387322              ,              0.22293707              ,              0.17594222              ]              )              ,              0.7226

Results

Now that I've analysed my mood clusters, I am going to try and define each mood! To do this, I will do the following:

Summarize the cluster's audio characteristic statistics
Effort to define a respresentative mood
Look at a sample of the song data for manual confirmation

First, I scale all of my data features using the sklearn Standard Scaler. The scaler will transform each audio feature value such that each audio characteristic has a mean of 0 and variance of 1 across all songs in the characteristic column. This will allow for a much clearer visualization of comparing features for each cluster.

I now visualize the feature differences using seaborn's heatmap plot:

Features

Cluster 0: HYPE mood

Hype

Cluster 0 seems to be exciting, fast paced songs with a lot of words. Its sound features are:

Very High Speechiness and Tempo
Loftier Loudness and Energy
Semi-Low Danceability

These audio features point to the mood cluster contain a lot of Lyrical, Hype Rap songs. A quick random sample immediately confirms that:

Song	Artist	Album
iPHONE (with Nicki Minaj)	DaBaby	KIRK
XO Tour Llif3	Lil Uzi Vert	Luv Is Rage 2
I'g Goin In	Drake	So Far Gone
Flex (feat. Juice WRLD)	Polo G	THE Caprine animal

Cluster 1: ANGSTY mood

angsty

Cluster ane is perhaps the well-nigh ambiguous mood cluster. Its audio features are:

Very Low Tempo and Valence
Low Instrumentalness and Acousticness
Average Energy and Danceability

These songs are not specially positive, but are non instrumental/audio-visual lamentable songs either. The average energy and danceability makes me experience like the cluster's songs are catchy but a lilliputian more angsty and not quite as hype as Cluster 0.

These audio features point to the mood cluster containing a lot of Moody Popular/R&B Songs. Hither's a random sample of songs in the cluster:

Song	Artist	Album	Label
I Trip the light fantastic	Drake	Views	one
Too Late	The Weeknd	Later on Hours	1
Own It (feat. Ed Sheeran)	Stormzy	Heavy Is The Head	one
EARFQUAKE	Tyler, The Creator	IGOR	1

Cluster 2: HAPPY mood

Happy

Cluster 2 was similar to Cluster 0 in loudness and free energy, though specifically different in speechiness and danceability. Its audio features are:

Very High Danceability and Valence
High Loudness and Energy
Low Instrumentalness

These songs are happy and exciting and brand yous want to dance. Yet, unlike Cluster 0, the songs exercise not have a high speechiness.

These audio features point to the mood cluster contain a lot of Upbeat Happy Pop - catchy songs you tin can sing and dance to. Here's a random sample of songs in the cluster:

Song	Creative person	Anthology	Label
Medicated (feat. Chevy Woods & Juicy J)	Wiz Khalifa	O.N.I.F.C. (Deluxe)	2
Suspension My Heart	Dua Lipa	Futurity Nostalgia	2
Loyal (feat. Lil Wayne & Tyga)	Chris Brown	X (Expanded Edition)	ii
Juice	Lizzo	Cuz I Love Yous	2

Cluster 3: GLOOMY/EMOTIONAL mood

Sad

Cluster iii was probably the easiest mood for me to identify. Hither are the audio features:

Very Loftier Acousticness and Instrumentalness
Very Low Loudness and Energy
Low Danceability and Speechiness

These songs are slow and emotional. The sound features point to the mood cluster containing songs in a mood I would probably call my Sad Boi Hours. Here'south some sample songs:

Song	Artist	Album	Characterization
Vertigo	Khalid	Suncity	three
Concur Me By The Centre	Kehlani	SweetSexySavage (Deluxe)	iii
Summer Friends (feat. Jeremih & Francis & The Lights)	Risk the Rapper	Coloring Volume	3
FourFiveSeconds	Rihanna	FourFiveSeconds	three

Creating Mood-Based Playlists in Spotify

Using Spotipy'due south playlist access with the scope "playlist-change-public", it is possible to create Spotify playlists right from a Jupyter Notebook!

                          from              spotipy.oauth2              import              SpotifyOAuth  scope              =              'playlist-change-public'              token              =              util.prompt_for_user_token(username=username,              scope=telescopic,              client_id=client_id,              client_secret=client_secret,              redirect_uri=redirect_uri)              sp              =              spotipy.Spotify(auth_manager=SpotifyOAuth(client_id,client_secret,redirect_uri,scope=scope,username=username)              )

                          def              create_mood_playlists              (moods,              df,              num_clusters,              playlist_length)              :              '''     Input: Listing of defined moods, features df, number of clusters, len of desired playlist     Output: Spotify Playlist     '''              for              moodnum              in              range              (num_clusters)              :              mood_data              =              df[df.label==moodnum]              sp.user_playlist_create(username,              moods[moodnum]              )              playlist_id              =              sp.user_playlists(username)              [              'items'              ]              [              0              ]              [              'id'              ]              playlist_song_IDs              =              list              (mood_data[              'track_id'              ]              .sample(playlist_length)              )              sp.user_playlist_add_tracks(username,              playlist_id,              listing              (playlist_song_IDs)              )              moods              =              [              'Hype'              ,              'Angsty'              ,              'Happy'              ,              'Sad'              ]              num_clusters              =              4              playlist_length              =              20              create_mood_playlists(moods,              song_prefs,              num_clusters,              playlist_length)

Check 'em out!

Next Steps

Build a Web App that, when given Spotify username and token access info, the app will run the Grand-Means Algorithim, create defined moods, so build custom mood-specific playlists for the user directly in the Spotify app.

Thank you For Reading!

hollawaydautly1938.blogspot.com

Source: https://veeraldoesdata.com/spotify-mood-playlists/