Ever wonder how online music services like Spotify come up with their recommendations for what songs or artists you’ll like or generate playlists based on a single tune? Turns out it’s quite a complex nut to crack (shocking, right?). One company that’s powering the recommendations behind an increasing number of popular music services has provided a glimpse into the black box that is music recommendation.
The Echo Nest is a company started in 2005 and headquartered in Massachusetts that currently supplies the recommendation engine for a number of popular music services such as Spotify, Rdio and iHeartRadio. They also provide an API that developers can use to tap into their detailed collection of music and artist data that they collect and generate, that’s used by, among others, the BBC and MTV. Recently, one of the Echo Nest’s founders, Brian Whitman, wrote in detail about music recommendation and the Echo Nest’s unique approach to the problem. It’s a fascinating read.
Whitman writes that the Echo Nest's approach to music recommendation is based on the principle of “care and scale.” Scale means they want to be able to know about as many artists and songs as possible in order to be able to recommend new ones to people. Care means the recommendation is useful for the listener and musician, and not just a third party vendor (e.g., Amazon).
“Care is a layer of quality assurance, editing and sanity checks, real-world usage and analysis and, well, care, on top of any systematic results,” Whitman writes. “Without both care and scale you’ve got a system that a listener can’t trust and that musician can’t use to find new fans.”
To achieve these goals the Echo Nest performs both sophisticated text and acoustic analysis in high volumes. For the former, they crawl the web to find people talking about music and use natural language processing to determine “cultural vectors” or “top terms” for artists, each one weighted for importance. They also pull structured data from partners and community sites (like Wikipedia). This text and cultural analysis is the heart of their recommendation engine; when a query about an artist or song comes in, they use their cultural vectors to find similar artists or songs.
For acoustic analysis, they analyze recordings from the catalogs of almost every music service and determine attributes like frequency, loudness, pitch, timbre, beats per bar, etc. They also use machine learning to track things like danceability, energy and liveliness. While the cultural and text analysis is the main ingredient to finding similar artists and songs, the acoustic analysis is used to make better playlist recommendations, for example to ensure smooth transitions, stable instrumentation, consistent tempo or key. “Songs should flow into one another like a DJ would program them,” writes Whitman.
I was so intrigued by the size of the problem the Echo Nest team faces that I contacted Whitman to find out a little more about how they scale this mountain of big data.
Whitman told me that they currently crawl about 10 million web pages daily, which has been growing at a rate of 5x per year, looking for people talking about music (they have a "musicality" detector which determines if a web page is about music). They also perform acoustic analysis on about 100,000 songs per week. Rather than store the source audio, they extract and store song metadata from recordings, using fingerprint technology to figure out when multiple recordings are really the same song. They generate roughly 100MB worth of metadata for a 5MB MP3.
Currently, this has all added up to data on some 2 million artists covering 35 million songs and 100s of terabytes of storage and indexes - and that’s just the input data to their engine. On the output end, via their API and the data they serve to their clients sites, “it's safe to say at least a hundred million Echo Nest queries get out on the Internet every day,” says Whitman.
In terms of software development, Whitman says they rely heavily on open source tools, like Solr, MySQL, Cassandra and Tokyo Tyrant. Their own code is written almost all in Python, aside from Java for Solr modifications and C for audio processing and the fingerprinting layer. To manage the development, Whitman says, “We are relatively Agile with Scrum, broken down into small functional teams (like audio analysis, or crawling, or ingestion.)”
Lots more technical details are available in a talk Whitman gave last year.
All in all, an impressive operation that the Echo Nest of pulls off with a team of 50 employees. Just so people like my daughters can find more artists like Justin Bieber or a playlist generated from a Nicki Minaj song - the better to annoy their daddy with. Sigh.