Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

Click here to return to the 'A script to find duplicate iTunes tracks' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
A script to find duplicate iTunes tracks
Authored by: Thom on Sep 18, '04 02:51:02PM
My comment is related, but along a different line: What if you have multiple copies of a *song*, but files which are different? (Bit rate, metadata in ID3 tags, etc.)

I'd like to chime in here and mention Jay Tuley's excellent software, iEatBrainz, which works with the MusicBrainz database.

Basically what it does is look in your library for songs which don't have fully fleshed out sets of tags, or lets you pick out songs that you know don't have the right or full info associated with them. Then it 'listens' to those songs and produces a 'fingerprint' for each one. It then compares this 'musical fingerprint' to an online database (MusicBrainz), and if it finds one it thinks it knows, it'll suggest that song in a pulldown list next to the track name. Then it fixes the fields for you and has iTunes update the song data.

You can even tell it what the track is if you know the artists, album, etc. and it'll fill in all the proper info. MusicBrainz has a pretty nifty search function on their site -- you can find by artist, album or track name.

To my knowledge, iEatBrainz cannot access an iPod. So all of the tracks on my machine are all cleaned up, but my iPod is still a mess. I talked to Jay about this, and he confirmed one of my ideas: I'll make a new account on my machine, temporarily. Using the terminal, I'll copy all of the music OFF of my iPod (not delete, just yet, copy...) to an external drive (30 gb of music? My powerbook's 80gb drive has like 500 mb free :) and bring it into iTunes, then run iEatBrainz on it. Then it'll be all properly tagged and organized, I can reload the iPod with the up to date song data, etc...

The one thing that I think is really unfortunate about this is, due to the way iEatBrainz accesses (*and updates) the information about each track, it's through iTunes. So if there isn't a mechanism in place in iTunes for iEatBrainz to say, 'store this bit of info in this ID3v2.4 field', then your songs will get cleaned up, but the uuid's aren't getting saved along with the file. They really ought to be! After all, all this work just got done, and the resources of the MB site got used... better to try and only do it once per file.

(There is, however, a program called (for now) Picard, written by Robert Kaye, the founder of MusicBrainz. It's still pretty rough but it should be able to not only identify files using audio fingerprinting, but also write the uuid information into their ID3 tags.)

How does this relate to finding duplicate songs? Well, first of all, as this app helps you clean your collection, you can write a simple script to walk your iTunes folder tree (as long as it's kept sorted by Artist, Album name, then Track name) and find duplicate files. Or use one of the ones mentioned in this thread.

But, as someone mentioned above, it can be a real PITA when one little bit of info changes about a MP3 file's ID3 tags -- but only in one version of a song. Now you can't do anything like MD5 hash the entire file. I had considered just MD5 hashing the data portion (minus the ID3 tags) but realized that stuff like two different 'duplicate' files might have a different duration, bit rate, etc. and that makes the whole thing moot.

So, what I want to find is a way to encode the MusicBrainz song ID into my ID3 tags, and then run a utility (probably with a DB behind it; why not?) to catalog all of my songs by reading that song ID from the file. Any time it found a duplicate song ID, I could drill right down to the two files and compare them.

I may write this in PHP; while id3 support is being built into v5.0+, it only supports older versions. But there is a nice project called getID3 which handles an impressive number of types of metadata (even EXIF tags, quicktime stuff, Ogg, lots..) and writes some too.

Anyone else gone down this road yet, so I don't end up reinventing the wheel?

I should also mention here that MusicBrains and iEatBrains are both projects that could (and should) use some donations to keep them running!

[ Reply to This | # ]