Sunday, May 04, 2008

Pandora - Music Classification and Personalization

As I write this, I am listening to my very own personalized radio station on Pandora. Pandora is a web site which uses your musical tastes to design a customized "radio station" that will serve up only the music you love. I have been listening to it, on and off, for about couple of weeks now, and I must say, with apologies to MacDonald's Corp., that I'm lovin' it.

Now, had I not been a programmer, I would have simply been impressed, and accepted Arthur C. Clarke's third law - "Any sufficiently advanced technology is indistinguishable from magic", and gone on with my life. But being one, and being a sucker for this kind of stuff, I keep thinking of how they do it, rather than accept and enjoy the fact that they just do such a bang-up job. So having nothing more concrete to write about this week, here is my analysis. But be warned...its probably far enough from the mark to have the Pandora guys rolling on their office floor in laughter at my naivete.

The founders of Pandora are also the originators of the Music Genome Project. Each song in Pandora's collection is classified along 400+ attributes, called its 'genes'. Assuming exactly 400 attributes, each song is now a point in a 400 dimensional space.

When you register on Pandora, they ask you for three things - your age, gender, and your choice of songs for your station. The last can be an artist or a band, a genre or a period. While your age and gender are probably not song attributes on their own, they are very strong indicators of the "type" of music you like to listen to. The reason for this is that we tend to listen to most of our music during our teens, and our preferred genre of music usually happens to be whatever was most popular during this time. As to gender, boys and girls typically listen to different artists, even within the same genre, although the distinction may not be as clear cut as with age.

So, by the time you register and set up your station, Pandora already knows about 5-10 of the attributes of the songs you would probably like. So assuming 7 known attributes, the songs that you are most likely to enjoy are songs which lie closest in the 7-dimensional space defined by these attributes. This could be a simple Eucledian distance calculation:

1
2
3
4
5
6
7
8
distance = 0
for song in songs:
  for attribute in user.attributes:
    distance += (song[attribute].value - attribute.value) ** 2
  distance = sqrt(distance)
  if (distance < closeness_cutoff:
    song.play()
  continue

As they stream the songs in to you, Pandora asks that you optionally rate the song with either a thumbs-up and thumbs-down. What this does is weight your user object with the attributes of the song you just rated. The process can be thought of as "evolving" your user object to have more "genes" or attributes. So, something like this:

1
2
3
4
5
for attribute in song.attributes:
  if (song.ratedAs(thumbs_up)):
    user[attribute].value += attribute.value
  else:
    user[attribute].value -= attribute.value

Applying your "evolved" user object to filter out the songs will now result in distance calculations across more dimensions, and thus make it possible for Pandora to give you results which are closer to your tastes.

Even if my analysis is completely off the mark, I think the idea of a radio station customized per user is really cool. Its like having your personal music collection wherever you have a connection to the Internet. Kudos to the Pandora team for cooking this up.

2 comments (moderated to prevent spam):

Anonymous said...

Hey Sujit!

It's great to hear that you're enjoying Pandora!

Just so you know: your age and gender don't figure at all into our playlist algorithm. Your stations are all based only on your musical feedback.
Really cool theory though!

Let me know if you have any questions or comments...

Lucia, from Pandora
(lucia at pandora dot com)

Sujit Pal said...

Hi Lucia, thanks for your comment and thanks to you and your team for providing such a fine service.

And yes, now that I think about it, age and gender does seem to be somewhat inaccurate - a person may have been exposed to a number of genres which were not necessarily popular during their decade, and may have developed a liking for them, in which case it would lead to false positives. Oh well, thanks for clarifying that... :-).