Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

Export mailbox data to a tab-delimited text file Apps
A simple perl script, used in conjunction with find, can create an export file containing a tab-delimited list of the From, Subject, and Date fields of all emails in Mail. You could bring this into a spreadsheet or database for further processing. Modifying the script (the formail parameters, mainly) would allow you to extract the message body and other fields as well.

Read the rest of the article for the script and find syntax...

Create this script, name it (mailexport.pl was used in this exmaple), make it executable (chmod 755 mailexport.pl), and keep it in the directory from which you'll run the find command:
#!/usr/bin/perl
$full_dir = $ARGV[0] ;
($mail_dir = $full_dir) =~ s|^.+?//|| ;
$mail_dir =~ s|\.mbox/mbox|| ;
$mail_dir =~ s|/| : |g ;
$full_dir =~ s/ /\\ /g ;

open(MAIL, "cat $full_dir | formail -e -s formail -X Subject:
-X From: -X Date: -a Subject: -a From: -a Date: |") ;

$i = 0 ;
while (<MAIL>)
{
if (/^Date: (.+)/)
{ chomp ($date = $1) ; }

if (/^Subject: (.+)/)
{ chomp ($subject = $1) ; }

if (/^From: (.+)/)
{ chomp ($from = $1) ; }

if (++$i == 3)
{
print ("$mail_dir\t$from\t$subject\t$date\n") ;
$i = 0 ;
}
}

close(MAIL) ;
Note: The "open(Mail..." line has been split into two rows for readability; replace the line break with one space character.

Once you've prepared the script, you can export the chosen fields from all of your mailboxes by typing:
find ~/Library/Mail/Mailboxes/ -name mbox -exec ./mailexport.pl \
'{}' ';' > tab-delimited-list
Remember to execute the find from the directory containing the script, and change "mailexport.pl" to match whatever you named the script.
    •    
  • Currently 0.00 / 5
  • 1
  • 2
  • 3
  • 4
  • 5
  (0 votes cast)
 
[10,030 views]  

Export mailbox data to a tab-delimited text file | 15 comments | Create New Account
Click here to return to the 'Export mailbox data to a tab-delimited text file' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
Thanks!!!!
Authored by: tiagor on Sep 02, '02 10:03:06AM

I just tested this script, and it rules! Now I'll be able to nicely export my mailboxes to MySQL...



[ Reply to This | # ]
For Mac program users
Authored by: wfolta on Sep 02, '02 12:26:46PM

If you want to import the data into a Mac-ish program instead of UNIX-ish program, change the perl script to have "\\r" instead of "\\n". (Mac uses carriage return to delimit lines, UNIX uses line feed.)

Then, say you want to bring the data into Appleworks:

* In Appleworks, choose Open...
* Change the File Format popup to "All Available".
* Select the text file created by the script, but don't double-click or choose Open yet.
* Change Document Type popup to "Spreadsheet" or "Database", depending on which format you want to use.
* Hit Open and voila, it's imported!



[ Reply to This | # ]
Just a question...
Authored by: tiagor on Sep 02, '02 11:15:46AM
formail -I "" is supposed to extract the body of the message, correct?

I'm trying to include it on the script, but I think that I'm doing something wrong... do you know how to format the script to make it get also the body of the message?

Thanks a lot!

[ Reply to This | # ]
Mail body newline/tab probs
Authored by: wfolta on Sep 02, '02 12:33:17PM

I believe the formail option for including the body is -k but haven't tried it. The complicating factor will be what to do about newlines and tabs in the message body.

So now that I think about it, you'd have to do one of two things:

1. Get/use a db package for perl so you can directly insert into MySQL or PostgreSQL and skip the tab-delimited stage of things.

2. Somehow translate/escape tabs and newlines in message body text. (And then possibly un-escape in your destination program.

Hmmm... getting email with body into a Mac-ish program will be harder than I thought. Anyone have a good suggestion?



[ Reply to This | # ]
Cleaner but two-script version
Authored by: wfolta on Sep 02, '02 03:07:19PM

For those who want to build on the script, it would actually be better done as two scripts. The kludge with counting every three lines is ugly and may break.

So, with two files and the commandline:

find ~/Library/Mail/Mailboxes/ -name mbox \\
-exec ./mailbox1.pl '{}' ';' > tab-delimited-list

You can accomplish the same thing, but it's easier to then modify mailbox2.pl to get the particular fields you want to have.

(Note that mailbox1.pl doesn't really do much. The only reason it exists is because formail reads stdin and I can't figure a way to pipe or redirect input to formail as a part of a find or xargs command. If there were a way to do it, all the work could be done in mailbox2.pl.)

========== mailbox1.pl ==========
#!/usr/bin/perl
$full_dir = $ARGV[0] ;
($mail_dir = $full_dir) =~ s|^.+?//|| ;
$mail_dir =~ s|\\.mbox/mbox|| ;
$mail_dir =~ s|/| : |g ;

system("cat '$full_dir' | formail -e -X Subject: -X From: -X Date: -s ./mailbox2.pl '$mail_dir'") ;
====================

========== mailbox2.pl ==========
#!/usr/bin/perl
$mail_dir = $ARGV[0] ;

$date = "???" ;
$subject = "???" ;
$from = "???" ;

while (\<STDIN\>)
{
if (/^Date: (.+)/)
{ chomp ($date = $1) ; }

if (/^Subject: (.+)/)
{ chomp ($subject = $1) ; }

if (/^From: (.+)/)
{ chomp ($from = $1) ; }
}

print ("$mail_dirt$fromt$subjectt$date\\n") ;
====================



[ Reply to This | # ]
Stupid backslashes!!
Authored by: wfolta on Sep 02, '02 03:28:19PM

When I previewed the routines, I had to double-up my backslashes for it to look right. But when it posted, they stayed doubled. Also, the backslash in front of greater-than and less-than need to go. Can the message be fixed by the admin?



[ Reply to This | # ]
Beginning of body extract code
Authored by: wfolta on Sep 02, '02 03:37:51PM

(I'll try without doubled backslashes to see what I get...)

If you modify mailbox2.pl, you can get the body text:

#!/usr/bin/perl
$mail_dir = $ARGV[0] ;

$date = "???" ;
$subject = "???" ;
$from = "???" ;

while (<STDIN>)
{
last if (/^$/) ;

if (/^Date: (.+)/)
{ chomp ($date = $1) ; }

if (/^Subject: (.+)/)
{ chomp ($subject = $1) ; }

if (/^From: (.+)/)
{ chomp ($from = $1) ; }
}

$body = <STDIN> ;

print ("$mail_dir\t$from\t$subject\t$date\t", $body, "\n") ;

Note that the $body = line reads only a single line of the body. You'd have to read all the rest of the lines and filter them to handle tabs and newlines somehow, if you intend to output to tab-delimited format. If you'll use DB routines to go straight to a database, that wouldn't be a problem.



[ Reply to This | # ]
Also have to change mailbox1.pl
Authored by: wfolta on Sep 02, '02 03:40:43PM

Sorry for so many posts. Getting questions offline...

Need to supply the -k argument so mailbox2.pl gets the body:

cat '$full_dir' | formail -e -k -X Subject: ...



[ Reply to This | # ]
One-script version with full body text
Authored by: jima on Sep 03, '02 07:41:08AM
This is a one-script version of wfolta's "Cleaner but two-script version"... plus I added the To: field, and made a few changes to grab all of the body text. Not that this script MUST be named mailexport.pl (unless you change it in the source code).
#!/usr/bin/perl

if ($ARGV[1] == 2) {

	$mail_dir = $ARGV[0] ; 
	
	$date = "???" ;
	$subject = "???" ;
	$from = "???" ;
	$to = "???" ;
	
	while () {

		$body = '';
		push (@message, );
		foreach (@message) {

			if (/^From: (.+)$/) {
				chomp ($from = $1);
			}

			elsif (/^Subject: (.+)$/) {
				chomp ($subject = $1);
			}

			elsif (/^Date: (.+)$/) {
				chomp ($date = $1);
			}
			
			elsif (/^To: (.+)$/) {
				chomp ($to = $1);
			}
			
			else {
				$body .= $_;
			}

		}

	}
	
	print ("$mail_dirt$datet$fromt$tot$subjectt", $body, "n") ;

} else {

	$full_dir = $ARGV[0] ;
	($mail_dir = $full_dir) =~ s|^.+?//|| ;
	$mail_dir =~ s|.mbox/mbox|| ;
	$mail_dir =~ s|/| : |g ;
	$full_dir =~ s/ /\ /g ;

	system("cat '$full_dir' | formail -e -k -X Subject: -X From: -X " .
			"Date: -X To: -s ./mailexport.pl '$mail_dir' '2'") ;

}
To run it type (same as earlier post):
find ~/Library/Mail/Mailboxes/ -name mbox -exec ./mailexport.pl 
'{}' ';' > tab-delimited-list


[ Reply to This | # ]
Corrected
Authored by: jima on Sep 03, '02 07:48:23AM

We need a hint on how to post source code without it getting munged :-/

#!/usr/bin/perl

if ($ARGV[1] == 2) {

$mail_dir = $ARGV[0] ;

$date = "???" ;
$subject = "???" ;
$from = "???" ;
$to = "???" ;

while (<STDIN>) {

$body = '';
push (@message, <STDIN>);
foreach (@message) {

if (/^From: (.+)$/) {
chomp ($from = $1);
}

elsif (/^Subject: (.+)$/) {
chomp ($subject = $1);
}

elsif (/^Date: (.+)$/) {
chomp ($date = $1);
}

elsif (/^To: (.+)$/) {
chomp ($to = $1);
}

else {
$body .= $_;
}

}

}

print ("$mail_dir\t$date\t$from\t$to\t$subject\t", $body, "\n") ;

} else {

$full_dir = $ARGV[0] ;
($mail_dir = $full_dir) =~ s|^.+?//|| ;
$mail_dir =~ s|\.mbox/mbox|| ;
$mail_dir =~ s|/| : |g ;
$full_dir =~ s/ /\\ /g ;

system("cat '$full_dir' | formail -e -k -X Subject: -X From: -X " .
"Date: -X To: -s ./mailexport.pl '$mail_dir' '2'") ;

}



[ Reply to This | # ]
Entourage X
Authored by: jima on Sep 03, '02 08:04:56AM

I noticed that if you drag a folder from the Entourage X Inbox to the Finder it saves it in a *.mbox format. So to make this work with Entourage, drag a folder from Entourage to the same Finder directory that the script is in (in this example the folder was called 'work' in Entourage, and in the Finder it appeared as 'work.mbox'). Next, use BBEdit (or ??) to change the line endings from Mac to Unix. Then in the Terminal from the directory that the script is in, type:

perl ./mailexport.pl work.mbox > tab-delimited-list

whoo-hoo! :)



[ Reply to This | # ]
One-script version with full body text
Authored by: Miga on Sep 03, '02 06:35:10PM

Does this script run with Jaguar?

I have nothing else as delete messages for mac.com mailbox and outbox in Mailboxes, the input boxes are in separate folders together with drafts, deleted messages (again), message UIDs already downloaded, and sent messages.

I have tried the script with the name of one this folder instead of Mailboxes, but it does not seem to work.

Maybe I did something wrong, as I'm new at shell scripting.



[ Reply to This | # ]
One-script version with full body text
Authored by: wfolta on Sep 04, '02 10:16:53AM

These scripts should work with Jaguar. Note that the find command assumes the mailboxes are in Library, which I believe changed from early versions of Mail that had them in Mailboxes directly in your home directory. (At least I had a Mailboxes directory in my home directory, which I think was a result of that.)

What kind of error/problem are you getting?



[ Reply to This | # ]
How would you run this on a specific mailbox?
Authored by: illovich on Jun 18, '04 11:59:47AM

How would you run this on a specific mailbox? I'm tying to run it on a mailbox:

/Users/illovich/Library/Mail/Mailboxes//From Faculty.mbox

(I don't know why that double slash is between the file and directory t the end, btw... maybe it has something to do with my find. I was trying to do this:

[code]find ~/Library/Mail/Mailboxes/ -name 'From Faculty.mbox' -exec ./mailexport.pl '{}' ';' > tab-delimited-list[/code]

but I get a blank text file. Can anyone see what I'm doing wrong? Thanks.

---
illovich
http://illovich.com



[ Reply to This | # ]
One-script version with full body text
Authored by: brunus on Sep 01, '08 03:11:57AM

hi there,
nice script!
But how could I export the header field for specific messages, from within mail, into a csv or tab file?
any idea on this?

thanks,

brunus



[ Reply to This | # ]