Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

Delete large numbers of duplicate emails from Mail.app Apps
This workaround, using Thunderbird, allowed me to successfully remove 30,000 duplicate emails (from a collection of about 80,000 emails) in OS X's Mail.app. I spent a lot of time searching this question, and this is the only solution I found that worked.

My mail.app emails got rather out of hand; I won't bore you with how. I had most emails at least twice, and some up to five times. I tried Andreas Amann's Mail Script for this, but, even though it was working OK, it only found about 200 duplicates in three hours, and was cooking my CPU at about 95%. There was no way that this could go on, so I cancelled the process. (Thanks anyway, AA.)

I looked at importing into Entourage, because there are some scripts to eliminate duplicates from there, but for reasons I shall not bore you with here, this proved a dead end.

The solution turned out to be the amazing add-on for Mozilla's Thunderbird called Remove Duplicate Messages (ALTERNATE). I had to use version 3.0 of Thunderbird, not the current version 3.01 (so thanks to those reviewers who reported that the add-on version 0.3.3 was failing with Thunderbird 3.01). I found the older version on the Thunderbird releases page. Below is the process I used. (It took several hours, because of the number of emails. Ironically, every stage except the actual analysis of duplicates takes ages. This makes using Thunderbird permanently quite tempting.)

Note: The following process needs to be done for each folder that resides in the On My Mac section of Mail. (I will have more to say about that soon.)
  1. Use Mailbox » Archive Mailbox in Mail.app to create a proper mbox export file of the mail folder you want to de-duplicate. (The mailboxes that Mail keeps in the Mail folder inside your user's Library, with their extension .mbox, are not real mboxes.)
  2. Follow these instructions to import your newly-created mbox file quickly and easily into Thunderbird. I think my use of Path Finder (instead of Finder) helped a bit here. After I did this, I did wait a long time for the Spotlight indexing in Thunderbird to finish -- not sure whether this was necessary, but I suspect it probably was.
  3. Install the add-on mentioned above. Set its prefs for email matching criteria (in Thunderbird » Prefs » Manage Add-Ons) according to what works for you. I ran some tests with a small collection of dupes until I had these right. What worked for me was ticking Message ID plus a few others, but unticking Size, Lines, CC and Body. This did an excellet job of correctly collating the two to five copies of each email.
  4. Move the dupes to a chosen folder (e.g. trash). With each of my folders of about 40,000 emails, I had to wait roughly 30 seconds for the dialog box to appear. Then it took just two minutes to move about 13,000 dupes to the 'unneeded duplicates' folder I created. Amazing.
I did hit one snag, because I had 80,000 emails in a single folder in Mail (I had moved them into one folder in the hope that I could run Andreas Amann's script all night. Like the Thunderbird add-on, the Mail script searches for dupes inside one folder at a time.)

Mail.app (on both my attempts) only created an archive mbox of about 4.3GB, which comprised some 43,000 emails; the remaining 36,000+ didn't make it into the archive! Luckily, I found when I had imported these into Thunderbird that they were in date order, and that no emails were dated earlier than 27Oct07. So I went back to Mail.app and created a new folder, then dragged all the pre-27Oct07 emails (the remaining 36,000+) into it, and created another Archive mbox. I then repeated the import process and everything worked.

Interestingly, it took close to half an hour in Mail.app for my MacBook Pro (Core2Duo at 2.2GHz) to even select those 36,000 emails -- be patient while the wheel spins! It then took nearly another hour to move them to the new folder and index them and their 6,000 attachments. (Again, I'm not sure that I really needed to wait for that indexing, but whatever.) Still, it was all wonderfully stable.

So, as the last step, I have reimported everything into Mail.app (File » Import mailboxes » files in mbox format). But I'm thinking of experimenting with Thunderbird as well, given the third-party geniuses who write extensions for it. It seems very zippy.

Finally, I get tired of reading blogs that tell me I don't need past emails. In my job (I'm a senior high school English teacher), I need them all the time. I write substantive replies to student X's questions, then rehash them for student Y, maybe years later. (It's much more complicated than that even, but you see my point!) One day I will delete the thousands I don't need, too.
    •    
  • Currently 2.09 / 5
  You rated: 1 / 5 (11 votes cast)
 
[30,363 views]  

Delete large numbers of duplicate emails from Mail.app | 12 comments | Create New Account
Click here to return to the 'Delete large numbers of duplicate emails from Mail.app' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
Delete large numbers of duplicate emails from Mail.app
Authored by: dfbills on Jan 28, '10 08:01:11AM

Did these solution break the message metadata like "replied to" and "forwarded"?

And how about attachments, did they remain intact?

---
-d



[ Reply to This | # ]
Gmail
Authored by: hzc on Jan 28, '10 08:31:34AM

I use Mac OS X's Mail app in conjunction with Gmail. Gmail automatically takes care of duplicates. Now, when an e-mail is labeled in Gmail and shows up as a duplicate in Mail because the two pieces are in a separate folders for example, I guess Mail actually has duplicates, but those are duplicates that you wanted anyway.



[ Reply to This | # ]
Delete large numbers of duplicate emails from Mail.app
Authored by: umbjm on Jan 28, '10 09:15:22AM

I had a similar problem and simply archived using MailSteward ($49). The archive didn't include any of the duplicates, and it has the advantage of making this problem less likely in the future.



[ Reply to This | # ]
Delete large numbers of duplicate emails from Mail.app
Authored by: vocaro on Jan 28, '10 04:31:22PM

I was about to say the same thing! I too archived a bunch of emails with MailSteward, and it had the happy side effect of removing all dupes. Worked like a charm.



[ Reply to This | # ]
Delete large numbers of duplicate emails from Mail.app
Authored by: sophistry on Jan 28, '10 07:41:49PM
had this same problem and solved it with an applescript:
tell application "Mail"
	activate
	repeat 500 times
		set theSelection to selection
		set theMessage to item 1 of theSelection
		set subj to subject of theMessage
		set recip to the recipients of theMessage
		set dats to date sent of theMessage
		set datr to date received of theMessage
		set theid to the message id of theMessage
		set siz to message size of theMessage
		
		tell application "System Events"
			key code 125 -- down arrow
		end tell
		
		set messagechanged to false
		repeat until messagechanged
			delay 0.25
			set theSelection2 to selection
			if (the (count of theSelection2) is equal to 0) then
				--display dialog "empty selection"
				set messagechanged to true
				say "skipping message"
			else
				set theMessage2 to item 1 of theSelection2
				set subj2 to subject of theMessage2
				set recip2 to the recipients of theMessage2
				set dats2 to date sent of theMessage2
				set datr2 to date received of theMessage2
				set theid2 to the message id of theMessage2
				set siz2 to message size of theMessage2
				
				if (theid2 is equal to theid and siz2 is equal to siz) then
					tell application "System Events"
						keystroke "x" using {command down}
					end tell
				else
					set messagechanged to true
					beep
				end if
			end if
			
		end repeat
		
	end repeat
end tell

tell application "Script Editor" to activate
---
soph


[ Reply to This | # ]
Delete large numbers of duplicate emails from Mail.app
Authored by: Fairly on Jan 29, '10 02:38:20AM

80,000 messages? Your problem isn't what you think.



[ Reply to This | # ]
Delete large numbers of duplicate emails from Mail.app
Authored by: mal0rd on Jan 29, '10 02:50:12AM
I also have such a large number of messages, mostly from work and mailing lists. At the end of the year I went to archive them and found that the "Mail Scripts" version was too slow.

Download my script or copy/paste this into Applescript Editor:


(*
 Select Dups
 
 Devin Bayer (http://t-0.be) - 2010
 
 To use:
 1. Select messages in Mail.app
 2. Uncheck "Organize by Thread"
 3. Run this script
 4. Only duplicate message will be selected
 
 ---- Performance ----
 On my MacBook Pro 2.3Ghz, I can scan
 about 5000 messsages a minute.
 
 Only run this script using AppleScript Editor.
 When run standalone or in Mail.app, the speed
 (and CPU usage) is drastically reduced
 
 ---- Notes ----
 If you want to use mail while this script is running,
 please create a second message viewer window to work in.
 
 
*)
using terms from application "Mail"
	
	-- track the duplicate messages
	set dups to a reference to {}
	global dups
	
	set view to first message viewer of application "Mail"
	global view
	
	on progress(txt)
		display dialog txt 
			giving up after 1 with icon note
	end progress
	
	-- return a list of emails in msg
	to rcpt(msg)
		set emails to {}
		repeat with email in recipients in msg
			copy address of email to the end of emails
		end repeat
		return emails
	end rcpt
	
	-- compare two messages
	to compare(l, r)
		if r = none or l = none 
			or message size of l ≠ message size of r 
			or subject of l ≠ subject of r 
			or my rcpt(l) ≠ my rcpt(r) 
			then return false
		
		-- l and r are equal; mark one as a dup
		--set background color of r to red
		--set flagged status of r to true
		copy r to end of dups
	end compare
	
	-- set selected messages to dups 
	on finish()
		if (count of dups) < 1 then return true
		try
			set selected messages of view to dups
		on error number -1712
			display alert "TIMEOUT"
			return false
		end try
		return true
	end finish
	
	my progress("retreiving list of selected messages")
	set sort column of view to size column
	set msgs to get selected messages of view
	set total to count of msgs
	
	-- initialize state
	set prev to none
	set pos to 0
	set failed to 0
	
	-- scan every message
	repeat with msg in msgs
		try
			with timeout of 1 second
				compare(msg, prev)
			end timeout
		on error number -1712
			set failed to failed + 1
		end try
		set prev to msg
		
		-- progress dialog
		if pos mod 10000 = 0 then
			my progress("Processing message " & pos & " of " & total)
		end if
		set pos to pos + 1
		
	end repeat
	
	set ok to false
	repeat while not ok
		display dialog "Done scanning! total: " & total 
			& " timeouts: " & failed 
			& " dups: " & (count of dups) 
			& 
			" (click OK to select dups)" giving up after 60
		set ok to my finish()
	end repeat
	
	return true
end using terms from


[ Reply to This | # ]
Delete large numbers of duplicate emails from Mail.app
Authored by: tedw on Jan 29, '10 07:31:48AM
It occurred to me that this would be best handled using a rule action. That way, messages could be tested as they arrived, automatically, as well as in bulk. the rule action script looks like this (untested):
using terms from application "Mail"
	on perform mail action with messages theMessages for rule theRule
		tell application "Mail"
			repeat with thisMessage in theMessages
				set theAccount to account of mailbox of thisMessage
				tell theAccount
					tell mailbox "INBOX"
						set theDupList to (every message whose 
							message size = message size of thisMessage 
							and subject = subject of thisMessage 
							and recipients = recipients of thisMessage)
						if (count of theDupList) > 1 and thisMessage ≠ last item of theDupList then
							set background color of thisMessage to purple
							-- delete thisMessage
						end if
					end tell
				end tell
			end repeat
		end tell
	end perform mail action with messages
end using terms from
because this is an untested version, I have it set to mark the emails in purple, and I've commented out the delete line (though ultimately you would want to uncomment that and delete emails automatically - test to make sure the script works as you want, first). This script will take the current email, check to see if there are any other emails that have the same message size, subject line, and to recipients in the INBOX of the same account, and mark/delete that email if there are duplicates (unless this email is the oldest email in the matching emails - oldest emails are preserved so that at least one copy of the email remains).

to use this, copy the script in the the applescript editor and save it as a script file. In Mail, set up a rule action that calls this script. the script will then run automatically on incoming emails, or you can run it on a given message or an entire mailbox using the Message -> Apply Rules menu item.

[ Reply to This | # ]

Delete large numbers of duplicate emails from Mail.app
Authored by: david-bo on Jan 31, '10 09:48:38AM

Why wasn't message id enough to identify duplicates? It must be unique. All messages with identical message id are duplicates unless you haven't modified them locally.

Actually, old Eudora had an option to delete duplicates based on message-id that, if selected, was applied every time you opened a mailbox. A very simple and elegant solution.

I would be interested in finding a script for IMAP-mailboxes doing this. Anyone?



[ Reply to This | # ]
Delete large numbers of duplicate emails from Mail.app
Authored by: Kalak on Feb 26, '10 02:00:39PM

Assuming mbox format (such as used on some IMAP servers, or from the "Archive Mailbox" in Mail.app, save this as something like /usr/local/bin/mbox-removedup.sh
then run it on the mbox in a terminal:
sh /usr/local/bin/mbox-removedup.sh ~/Inbox

--begin copy--
#!/bin/sh
formail -D 10000000 idcache < "$1" -s > ztmp && mv ztmp "$1"
rm idcache
--end copy--

---
--
Kalak
I am, and always will be, an Idiot.



[ Reply to This | # ]
Delete large numbers of duplicate emails from Mail.app
Authored by: david-bo on Apr 09, '10 02:47:12PM

Care to explain how your script works?



[ Reply to This | # ]
Delete large numbers of duplicate emails from Mail.app
Authored by: bradknowles on Jul 20, '13 05:00:50PM
The "formail" program is a part of the "procmail" package, which dates back to 1990. Since OS X is based (in part) on Unix, procmail is something that has been included by default as part of the operating system for as long as I can remember.


The code in the post you are responding to is based on the "procmail examples" or "procmailex" man page, and you can find a copy of that page at https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man5/procmailex.5.html. You can see a copy of the "formail" man page at https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man1/formail.1.html.


Note that this example will delete duplicates based exclusively on the content of the Message-id: header, which is SUPPOSED to be unique for each and every message, but in some cases is not. This is part of why the procmailex man page suggests keeping a "duplicates" mailbox which you can go through manually to see if there are any messages which were mistakenly believed to be duplicates. This is also part of why other AppleScript examples you may have seen will check more than just the value of the Message-id: header.

---
--
Brad Knowles


[ Reply to This | # ]