How to thoroughly trim Mail's mailbox sizes
Authored by: syzygies on Feb 24, '06 11:04:02AM

I recently had to delve far deeper into this than I would like. Ten years of my email had distinct geological layers by organizational regime. Now, when my hero Steve Jobs isn't trying to be known as the guy who killed the album, he promotes this brilliant idea that one's file organization shouldn't matter; let Spotlight do the heavy lifting. This more or less works with "Smart Mailboxes" in Mail, except for the annoying delays each time the Smart Mailbox rebuilds. A better program design could hide these rebuilds from the user, caching views and rebuilding incrementally. But, hey, this is Spotlight v.1.

Anyhow, I decided to rearrange my last few years of email into a "year, month" folder structure. Every time Mail hit a missing message, I'd get the error described in this post, aborting the copy. Unfortunately, this aborted the delete phase for the messages already copied, producing thousands of duplicate messages. Worse, one couldn't simply use a duplicate hunter to fix the problem, as Mail writes a field into its XML at the end of each message recording its last folder (perhaps to be used in the delete phase that-wasn't?).

To fix this, I used BBEdit to standardize these XML fields to some bogus home planet, then I used a duplicate hunter to remove duplicate messages. It was a huge, time-consuming mess. I reported this bug to Apple.

One should study the individual message format to make educated guesses as to what Mail is doing, rather than trying to make sense of Mail's functional behavior. In particular, attachments are stored in each message, and also sometimes in a folder Mail sets aside. A bit of experimentation could sort out what actually happens when one tells Mail to delete attachments. My impression is that the copy in each message weathers any storm short of deleting the actual message file.

This is all ripe for a Perl script; I'd like to remove duplicates without being fooled by Apple's variant XML comments, and then sort all 10 years of mail into year, month folders, using Perl to parse the date field in each message, fixing bogus date fields. Unfortunately, Mail isn't scriptable, so the rebuild mailboxes phase would need to be carried out by hand.

Authored by: sjk on Feb 27, '06 02:36:29PM
Maybe this is partly helpful:

emlx to mbox Converter

And formail.

Btw, Mail is definitely scriptable.

Authored by: dancingbrook on Jul 10, '07 11:46:20AM