Friday, June 17, 2005

MusicBrainz

In an effort to organize my music collection a bit better, I decided to try out MusicBrainz. Normally, I would have done all the music research myself, but that was taking far too long. So I went back to the site that Ian had mentioned a long time ago, and gave it a shot. The system basically makes a "fingerprint" of the audio using a short clip of the waveform and the clip length. This means that it can be a different format (MP3, Vorbis) and still get the same fingerprint. The theory is that many people have submitted known-good fingerprints which can be looked up. The program will then change your id3 tags and rename the files based on the information in the database. For my music collection it successfully tagged around 6/7 of my music. Around 1/2 of the remainder had a "fingerprint collision" which means that more than one song in the database have the same fingerprint, and you need to pick which one is the correct one. For the rest of the songs, it plain old didn't know what they were.

For the most part, the program worked out well. The program actually picked out some pieces that I had attributed to the wrong artist. There was perhaps two false positives (out of around 1000 songs) that I had to go back and change. Most of the user required selections were easy, such as picking the first option in the list. The program seemed to have the most difficulty with classical music. This is understandable, as there are different versions of the same song (Boston Philharmonic vs. Philadelphia Philharmonic playing Beethoven's 6th), and many songs can sound the same if you pick too small a sample (long stretches of violins holding a single note). Selecting the proper information for a classical song is no easy task, either. When scrolling through the list of albums for someone like Tchaikovsky, it is difficult to find an album that will have a song that matches the song on the hard drive, as different CDs have cut classical compositions at different points (eg. I have a Swan Lake excerpt that has Act II - Scene I, II, and III all in the same mp3 file, but most CDs have the scenes split up into different tracks).

It's times like this (organizing my mp3s and tagging them properly) that I wish more programs had support for Unicode and internationalized text. MusicBrainz isn't standardized themselves, as they have Kanno Yoko's name as the original Japanese Kanji (菅野よう子, for those who can read it), while a song from the Spirited Away soundtrack is listed as "A Summer's Day", when I've always known it as "Ano Natsu He". Firefox can handle Asian script, while Rhythmbox (default player for Gnome) cannot. I guess there has been a lot of work done on i10n and i18n standards, but it is up to the programmers to implement the capabilities. When you're developing applications, just keep in mind that less than 50% of the Internet uses English as the mother tongue.

1 comment:

Anonymous said...

Wow! Really great program... although it was not as successful with my music collection as it was with yours:

Identified 43.73%
Unidentified 55.84%
Errors: 0.43%