Harmonia is an embedding pipeline that turns a chaotic Spotify library into isolated, measurable dimensions of sound, language, and metadata. It extracts audio features, classifies lyrics, builds a 33-dimensional interpretable embedding, clusters the resulting vectors, and exports interactive visualizations/playlists.
For the full methodology, design decisions, and technical deep-dives, see the accompanying essay.
Time estimate: ~3-4 hours for 1,500 songs (mostly waiting: downloads, audio extraction, GPT API calls). All steps cache progress, so you can stop/resume.
brew install ffmpeg / apt install ffmpeg)Clone and install dependencies.
git clone https://github.com/IslamTayeb/harmonia.git
cd harmonia
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
You’ll need Spotify (to fetch your library), Genius (for lyrics), and OpenAI (for lyric classification).
# .env file
# Spotify - https://developer.spotify.com/dashboard
# Set redirect URI to http://127.0.0.1:3000/callback
SPOTIFY_CLIENT_ID=...
SPOTIFY_CLIENT_SECRET=...
# Genius - https://genius.com/api-clients
GENIUS_ACCESS_TOKEN=...
# OpenAI - https://platform.openai.com/api-keys
OPENAI_API_KEY=...
Pulls your saved tracks metadata from Spotify. First run opens browser for OAuth.
python spotify/fetch_spotify_saved_songs.py
Downloads MP3s for local audio analysis. Safe to stop/resume.
python songs/download_via_spotdl.py # or download_via_ytdlp.py
Fetches lyrics from Genius. Also safe to stop/resume.
python lyrics/fetch_lyrics.py
Extracts audio features (Essentia) and classifies lyrics (GPT). First run is slow (~2-3 hours for 1,500 songs: ~90 min audio extraction + ~60 min GPT API calls). Uses cache afterward.
python analysis/run_analysis.py --songs songs/data/ --lyrics lyrics/data/
Explore clusters, tune parameters, and visualize results.
streamlit run analysis/interactive_interpretability.py
Creates Spotify playlists from your clusters.
python export/export_clusters_as_playlists.py --dry-run # preview
python export/export_clusters_as_playlists.py # create
| File | Purpose |
|---|---|
analysis/run_analysis.py |
Main entry point |
analysis/interactive_interpretability.py |
Streamlit dashboard |
analysis/pipeline/interpretable_features.py |
33-dim vector construction |
analysis/pipeline/audio_analysis.py |
Essentia feature extraction |
analysis/pipeline/lyric_features.py |
GPT lyric classification |
analysis/pipeline/config.py |
Configuration & scales |