To achieve these goals the Echo Nest performs both sophisticated text and acoustic analysis in high volumes. For the former, they crawl the web to find people talking about music and use natural language processing to determine “cultural vectors” or “top terms” for artists, each one weighted for importance. They also pull structured data from partners and community sites (like Wikipedia). This text and cultural analysis is the heart of their recommendation engine; when a query about an artist or song comes in, they use their cultural vectors to find similar artists or songs.
For acoustic analysis, they analyze recordings from the catalogs of almost every music service and determine attributes like frequency, loudness, pitch, timbre, beats per bar, etc. They also use machine learning to track things like danceability, energy and liveliness. While the cultural and text analysis is the main ingredient to finding similar artists and songs, the acoustic analysis is used to make better playlist recommendations, for example to ensure smooth transitions, stable instrumentation, consistent tempo or key. “Songs should flow into one another like a DJ would program them,” writes Whitman.
I was so intrigued by the size of the problem the Echo Nest team faces that I contacted Whitman to find out a little more about how they scale this mountain of big data.
Whitman told me that they currently crawl about 10 million web pages daily, which has been growing at a rate of 5x per year, looking for people talking about music (they have a "musicality" detector which determines if a web page is about music). They also perform acoustic analysis on about 100,000 songs per week. Rather than store the source audio, they extract and store song metadata from recordings, using fingerprint technology to figure out when multiple recordings are really the same song. They generate roughly 100MB worth of metadata for a 5MB MP3.
Currently, this has all added up to data on some 2 million artists covering 35 million songs and 100s of terabytes of storage and indexes - and that’s just the input data to their engine. On the output end, via their API and the data they serve to their clients sites, “it's safe to say at least a hundred million Echo Nest queries get out on the Internet every day,” says Whitman.
In terms of software development, Whitman says they rely heavily on open source tools, like Solr, MySQL, Cassandra and Tokyo Tyrant. Their own code is written almost all in Python, aside from Java for Solr modifications and C for audio processing and the fingerprinting layer. To manage the development, Whitman says, “We are relatively Agile with Scrum, broken down into small functional teams (like audio analysis, or crawling, or ingestion.)”
Lots more technical details are available in a talk Whitman gave last year.