Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

A Perl script to find duplicate iTunes tracks UNIX
For some reason, I end up with a ton of "duplicate" tracks in my iTunes library and playlists. By this I mean that more than one track in iTunes maps to a single file on my hard drive. I may get into this problem because I sync my files between my Mac and my PC, and any ID3 tag change made on my PC may cause iTunes on the Mac to think it's a whole new track when I re-drop the files back into iTunes.

Anyway, Doug Adams has some great AppleScripts available to find and/or delete these dupes, but they don't always work for me.

So, I wrote a Perl script (view source) to parse the "Song List" which can be exported from iTunes (in the File menu). There are 25 different "fields" associated with every track in the list, and the Perl script allows you to easily pick which fields you want (by editing the Perl script where documented).
    •    
  • Currently 2.00 / 5
  You rated: 1 / 5 (5 votes cast)
 
[45,911 views]  

A Perl script to find duplicate iTunes tracks | 24 comments | Create New Account
Click here to return to the 'A Perl script to find duplicate iTunes tracks' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
Perl uber geek single line version
Authored by: SOX on Sep 17, '04 01:36:13PM
Or for you many uber geeks, try this instead:

perl -nwaF"\t" -0x00d -e '$x = join "\t",@F[0..3]; print $x,"\n"  if  exists $h{$x}; $h{$x}=1'   file_name_of_exported_song_list

That's the one line equivalent of the perl script program. Anything printed out is a duplicate

[ Reply to This | # ]

Perl uber geek single line version
Authored by: SOX on Sep 17, '04 01:39:47PM
Arrgg... lot the backslashes when converted to html. try again:

perl -nwaF"\t" -0x00d -e '$x = join "\t",@F[0..3]; print $x,"\n" if exists $h{$x}; $h{$x}=1;; ' file_name_of_song_list

gawd I love perl: the wood chipper of text processing.

[ Reply to This | # ]

Too many Perl scripts!!!!!!!!!!
Authored by: gourls on Sep 17, '04 06:28:23PM
That was WAYYY too many Perl scripts. I don't care how geeky you are. And no, I don't know HTML. Please! No more comments for this article.

---
if you must speak, speak of topics that may never be discussed again........... ....cheers, gourls

[ Reply to This | # ]

correction
Authored by: SOX on Sep 17, '04 01:42:00PM
perl -nwaF"\t" -0x00d -e '$x = join "\t",@F[0..3]; print $x,"\n" if exists $h{$x}; $h{$x}=1;' Name_of_song_list_file

perl, the woodchipper of wordprocessing.

[ Reply to This | # ]

even shorter
Authored by: SOX on Sep 17, '04 01:54:22PM
perl -nwaF"\t" -0x00d -e '$x = join "\t",@F[0..3]; print $x,"\n" if $h{$x}++ ' name_of_file

anyone have a shorter one?

[ Reply to This | # ]

Perhaps even longer??
Authored by: koncept on Sep 18, '04 01:17:44AM

This suggestion would be a lot longer to process, but any chance of posting a script that does a binary compare on each file located in a user's library.xml file and seek out duplicate audio files which may have different names but still be the same file?

There seems to be a [b]File::Compare[/b] on cpan which may do the trick.



[ Reply to This | # ]
try iEatBrainz for cleaning things up
Authored by: babbage on Oct 13, '04 11:09:48PM

The best application I've seen for this is iEatBrainz from MusicBrainz.org.

IEB analyses the acoustic fingerprint of each tracck in an attempt to look up missing track information, which seems like the best approach possible. This is a nice way to cope with the fact that the same track can exist at different bitrates, can have different sizes & times, and the ID3 tag data can be totally incorrect and just shouldn't be trusted.

It's far from perfect. Most obviously, IEB is really slow, especially on large iTunes libraries. Further, a lot of the data it provides can be questionable & needs to be double-checked, and -- most important for this topic -- it doesn't have a mechanism for de-duping.

Still, it's the only software of its kind that I know of, and it works well enough to be a huge time saver when trying to clean up your iTunes library. This should be able to get things clean enough that other approaches, such as the Perl script in this discussion or the Applescript here, can be effective.

---
--
DO NOT LEAVE IT IS NOT REAL

[ Reply to This | # ]

A Perl script to find duplicate iTunes tracks
Authored by: DougAdams on Sep 17, '04 03:18:42PM

Can anyone make this executable for "Person Who Knows Poop About Perl"? I have gotten a ton of emails on this particular article like I'm the one who wrote it. Perl newbies are clueless. How about a hand?



[ Reply to This | # ]
poopless perl executable
Authored by: SOX on Sep 17, '04 04:28:10PM
first the above single line is in fact a perl executable as it is written. but if you want to hide the complexity of this in a file then:

open a terminal window and cut and paste the following into the terminal followed by pressing control-D. The cat > dup_find will create file called dup_find with the rest of the text in it.


cat > dup_find
#!/usr/bin/perl 
# identify duplicates that have same four first fields
$/ = "\r";
while ($d = ) { 
     @F = split /\t/, $d ;
    $x = join " ::\t",@F[0..4]; 
    print $x,"\n" if $h{$x}++;
}

next make it executable:
chmod a+x dup_find

this will run from the command line as:
./dup_find name_of_exported_song_list_file

voila. But as I said the original 1 line perl is executable from the command line and would go nicely inside an applescript

I leave it to you to wrap a dropplet apple_script around the single line command.

[ Reply to This | # ]

Darn html!!!
Authored by: SOX on Sep 17, '04 04:35:31PM
arggh! its so hard to get code to show up right in these comment boxes. The last one was missing the <> symbol since it got removed by the htlp filter.

cat > dup_find
#!/usr/bin/perl 
# identify duplicates that have same four first fields
$/ = "\r";
while ($d = <>) { 
     @F = split /\t/, $d ;
    $x = join " ::\t",@F[0..4]; 
    print $x,"\n" if $h{$x}++;
}



[ Reply to This | # ]
poopless apple script dropplet perl version
Authored by: SOX on Sep 17, '04 06:08:46PM
here is an applescript droplet. This dropplet takes an exported song list file and replaces it with a new file that contains only the duplicate names. the original file is not deleted but rather renamed with a ".orig" suffix.

-- This droplet finds duplicates of songs from an exported song list file.
-- Author: Charlie Strauss 2004
on open this_item
	
	set target_name to this_item as string
	set posix_target to quoted form of the POSIX path of target_name
	
	
	
       set command to 
	  "perl -i.orig -0x00d -nwaF\"\\t\" -e '$x = join \" ::\\t\",@F[0..3]; print $x if $h{$x}++ '  "
	
        try	
		do shell script command & posix_target
		
			
	on error error_message
		beep
		display dialog "Whoa! " & error_message buttons 
			{"Rats"} default button 1
	end try
		
end open


[ Reply to This | # ]
poopless apple script dropplet perl version
Authored by: SOX on Sep 17, '04 06:37:40PM
here is an applescript droplet. This dropplet takes an exported song list file and replaces it with a new file that contains only the duplicate names. the original file is not deleted but rather renamed with a ".orig" suffix.

-- This droplet finds duplicates of songs from an exported song list file.
-- Author: Charlie Strauss 2004
on open this_item
        
        set target_name to this_item as string
        set posix_target to quoted form of the POSIX path of target_name
        
        
        
       set command to 
          "perl -i.orig -0x00d -nwaF\"\\t\" -e '$x = join \" ::\\t\",@F[0..3]; print $x,\"\\n\" if $h{$x}++ '  "
        
        try     
                do shell script command & posix_target
                
                        
        on error error_message
                beep
                display dialog "Whoa! " & error_message buttons 
                        {"Rats"} default button 1
        end try
                
end open


[ Reply to This | # ]
Perl newbies
Authored by: gourls on Sep 17, '04 06:23:58PM
Hey, I don't think that the guy who wrote the article is clueless, if he did get the script to work for him. So he might know somethin somethin about Perl. I noticed he mentioned your name in it. Sorry Doug, you'll have to deal with the emails by yourself. Do you know who hypert is?

---
if you must speak, speak of topics that may never be discussed again........... ....cheers, gourls

[ Reply to This | # ]

Perl newbies
Authored by: DougAdams on Sep 18, '04 08:33:50AM
I am well aware that the guy who wrote the hint is not clueless. I just don't think he anticipated the large number of people who would be enthralled by his hint who also don't know the first thing about Perl scripting. Can't blame him for that! (I address this in another comment in the main thread.)

[ Reply to This | # ]
Sorry
Authored by: gourls on Sep 30, '04 08:30:51PM
I'm sorry Doug. I don't know what I was thinking. I need to learn better. Sorry if I offended you in any way. :)

---
-brita

[ Reply to This | # ]

one more try
Authored by: SOX on Sep 17, '04 04:38:53PM

#!/usr/bin/perl 
# identify duplicates that have same four first fields
$/ = "\r";
while ($d = <>) { 
     @F = split /\t/, $d ;
    $x = join " ::\t",@F[0..3]; 
    print $x,"\n" if $h{$x}++;
}


[ Reply to This | # ]
A Perl script to find duplicate iTunes tracks
Authored by: DougAdams on Sep 18, '04 08:21:48AM

Hey, I'm not complaining and I can cope with the emails. The author did email me what he posted above and I will be posting it at my site soon. It's just that duplicate tracks in iTunes is a HUGE issue for people. I would say it is the single largest issue iTunes users have (if downloads of duplicate-related AppleScripts at my site is any indication). So in the interest of helping people who truly know absolutely nothing about Perl scripting I made the request above. Because, frankly, they see a hint like this thinking it is the Uber Solution (and it is a great one!), but get frustrated because they don't know how to proceed with it. So who do they ask? Hmmm...iTunes...Scripts...you do the Googling.



[ Reply to This | # ]
A Perl script to find duplicate iTunes tracks
Authored by: DougAdams on Sep 18, '04 09:08:17AM

'Nother thought:

If this script could operate on the actual XML file (in home > Music > iTunes, which is gen'd whenever iTunes quits) to get the database id's of the dupes, an AppleScript could collect the dupes by those database ids and put them in their own playlist (a la Corral All Dupes at my site).

Wheeee!



[ Reply to This | # ]
A script to find duplicate iTunes tracks
Authored by: Thom on Sep 18, '04 02:51:02PM
My comment is related, but along a different line: What if you have multiple copies of a *song*, but files which are different? (Bit rate, metadata in ID3 tags, etc.)

I'd like to chime in here and mention Jay Tuley's excellent software, iEatBrainz, which works with the MusicBrainz database.

Basically what it does is look in your library for songs which don't have fully fleshed out sets of tags, or lets you pick out songs that you know don't have the right or full info associated with them. Then it 'listens' to those songs and produces a 'fingerprint' for each one. It then compares this 'musical fingerprint' to an online database (MusicBrainz), and if it finds one it thinks it knows, it'll suggest that song in a pulldown list next to the track name. Then it fixes the fields for you and has iTunes update the song data.

You can even tell it what the track is if you know the artists, album, etc. and it'll fill in all the proper info. MusicBrainz has a pretty nifty search function on their site -- you can find by artist, album or track name.

To my knowledge, iEatBrainz cannot access an iPod. So all of the tracks on my machine are all cleaned up, but my iPod is still a mess. I talked to Jay about this, and he confirmed one of my ideas: I'll make a new account on my machine, temporarily. Using the terminal, I'll copy all of the music OFF of my iPod (not delete, just yet, copy...) to an external drive (30 gb of music? My powerbook's 80gb drive has like 500 mb free :) and bring it into iTunes, then run iEatBrainz on it. Then it'll be all properly tagged and organized, I can reload the iPod with the up to date song data, etc...

The one thing that I think is really unfortunate about this is, due to the way iEatBrainz accesses (*and updates) the information about each track, it's through iTunes. So if there isn't a mechanism in place in iTunes for iEatBrainz to say, 'store this bit of info in this ID3v2.4 field', then your songs will get cleaned up, but the uuid's aren't getting saved along with the file. They really ought to be! After all, all this work just got done, and the resources of the MB site got used... better to try and only do it once per file.

(There is, however, a program called (for now) Picard, written by Robert Kaye, the founder of MusicBrainz. It's still pretty rough but it should be able to not only identify files using audio fingerprinting, but also write the uuid information into their ID3 tags.)

How does this relate to finding duplicate songs? Well, first of all, as this app helps you clean your collection, you can write a simple script to walk your iTunes folder tree (as long as it's kept sorted by Artist, Album name, then Track name) and find duplicate files. Or use one of the ones mentioned in this thread.

But, as someone mentioned above, it can be a real PITA when one little bit of info changes about a MP3 file's ID3 tags -- but only in one version of a song. Now you can't do anything like MD5 hash the entire file. I had considered just MD5 hashing the data portion (minus the ID3 tags) but realized that stuff like two different 'duplicate' files might have a different duration, bit rate, etc. and that makes the whole thing moot.

So, what I want to find is a way to encode the MusicBrainz song ID into my ID3 tags, and then run a utility (probably with a DB behind it; why not?) to catalog all of my songs by reading that song ID from the file. Any time it found a duplicate song ID, I could drill right down to the two files and compare them.

I may write this in PHP; while id3 support is being built into v5.0+, it only supports older versions. But there is a nice project called getID3 which handles an impressive number of types of metadata (even EXIF tags, quicktime stuff, Ogg, lots..) and writes some too.

Anyone else gone down this road yet, so I don't end up reinventing the wheel?

I should also mention here that MusicBrains and iEatBrains are both projects that could (and should) use some donations to keep them running!

[ Reply to This | # ]

A Perl script to find duplicate iTunes tracks
Authored by: DougAdams on Sep 20, '04 09:52:30AM
SOX and I have come up with the first incarnation of Corral iTunes Dupes. This AppleScript uses Perl routines to briskly check your iTunes Music Library.xml file for dupes, then uses AppleScript to corral them into a discrete playlist. Works fast, fast, fast.

[ Reply to This | # ]
A Perl script to find duplicate iTunes tracks
Authored by: hypert on Aug 11, '05 03:23:11PM

Today I was recommending this thread to a friend (gratuitous self-promotion) when I saw all the additional comments that have been added since I first posted.

I've complained to Rob G. before that the Hints portion of this website needs a "subscription" method, just like the Forums here (and just about everywhere) have. Then, I would have known there was so many additional comments after I posted the Hint!

Anyway, now that I've caught up on the comments, I will say that the ubergeek in me is vastly impressed by how much SOX was able to shrink my script down. Very nice!

At first, I was confused by gourls' "clueless" comment, and then I realized he thought Doug's "newbies are clueless" comment was directed at me. I assume Doug was referring to the numerous Mac people (including many esteemed macosxhints readers) who are not UNIX/Terminal-savvy. I should have included the requisite "chmod" command in the original Hint, but I do forget that some people are familiar with that.

Anyway, I appreciate SOX's minimal-Perl effort (although Perl newbies might want to use the original code, since you can easily pick your own fields to be compared) and, as always, I appreciate Doug's tireless AppleScripting work! :-)

Of course, it's been almost a year since the last comment, so I doubt either of them are still monitoring this hint. We really need hint subscriptions...



[ Reply to This | # ]
A Perl script to find duplicate iTunes tracks
Authored by: snakesalive on Jul 02, '10 07:09:03AM
Here's a script I knocked together to remove a few hundred duplicates of the form musicfile.mp3, musicfile 1.mp3. The script locates such pairs of files and checks that they are the same size. It prompts for confirmation before deleting the duplicates. The working directory should be the iTunes Music Library, e.g. ~/Music/iTunes/iTunes Media/Music/

#!/usr/bin/perl

use strict;
use warnings;

use File::Find;
use File::Basename;

my $act = 0;
my $num = 1;

sub process
{
        / $num\.mp3$/ or return;
        my $file1 = $File::Find::name;
        my $file2 = $_;
        $file2 =~ s/ $num\.mp3$/.mp3/;
        if (-e "$file2") {
                my $size1 = -s "$_";
                my $size2 = -s "$file2";
                if ($size1 == $size2)
                {
                        if ($act != 0)
                        {
                                print "Deleting: $_\n";
                                unlink("$_");
                        }
                        else
                        {
                                print "\"$_\" (duplicate: \"$file2\", same size)\n";
                        }
                }
        }
}

find(\&process, '.');
print "Confirm delete? [y/n] ";
my $ans = <>;
chomp($ans);
if ("$ans" eq "y")
{
        $act = 1;
        find(\&process, '.');
}
else
{
        print "Process cancelled\n";
}


[ Reply to This | # ]
A Perl script to find duplicate iTunes tracks
Authored by: afingal on Jul 02, '10 12:54:36PM

snakesalive,

Do I presume correctly that your script depends on the very low probability that two different songs will have the exact same file size (to the byte?)



[ Reply to This | # ]
A Perl script to find duplicate iTunes tracks
Authored by: snakesalive on Jul 02, '10 03:26:28PM

To be more precise the two files should be detected as exact duplicates by this script if they satisfy all of the following:
a) are in the same folder
b) have filenames of the form "name.mp3" and "name 1.mp3"
c) have the same size.
This should pick up the situation when tagged mp3s are imported twice.



[ Reply to This | # ]