Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

Import thousands of old emails into Apps
Once upon a time, I used Outlook Express and then moved onto Entourage. I used to save my sent and deleted email each year as individual files and then create a text index so I could search them using Sherlock. I had quite a few CDs corresponding to each year on my shelf. One day I decided that I'd like these all to be on my OS X box. After all, we have more disk space these days.

I created a special user called mailarchive and this user would hold all these messages. The problem was: how do you convert these folders of individual mail files into corresponding mbox files which can then be imported into Mail. Now, if you only have a few hundred, you can drag them back into Entourage from the Finder, and then export them as an mbox file. However, if you have many thousands, you can't really do that because the Finder will go quite comatose if you drag 7,000 files and try and drag them into an Entourage folder. Here is how I solved the problem. It probably helps if you know a little Unix.

Here's what I did:
  1. Here you may need to rename your individual mail files. These may contain weird and wonderful characters which are not conducive to Unix scripting (eg ?'" etc). Renaming the files means you lose nothing in terms of data in your converted mbox, and it means you can cleanly run some Unix scripts. I used a program called filenamer to do that. In the actions: First action was to get it to just use the index as the file name, and the second action I used was to add the .eml extension to each file. I am sure there are other programs that can batch rename in this way.

  2. Next I converted the linefeed regime in each file from Mac to Unix (my files were OS 9 historic). To do this I used flip. I downloaded flip.osx and I moved it to ~/bin/flip using the Terminal as in:
     $ mv flip.osx ~/bin/flip
     $ chmod +x ~/bin/flip
    If you don't have a bin directory then you can do
     $ mkdir ~/bin
    If you want, you can of course also run flip from your Desktop directory in the Terminal. The next thing to do was run flip -u on each file. Because I had so many, I had to use xargs in the command line ... that is, I changed into the directory which contained all the .eml files. Assuming your eml files are in a directory on your Desktop called THEM you would do:
     $ cd ~/Desktop/THEM
     $ ls * | xargs flip -u
  3. Next I used the eml2mbox program. Download it and install it also to your bin directory. I removed the .rb extension because that is RealBasic under OS X. Once you have moved it to your bin and renamed it eml2mbox, don't forget to chmod +x eml2mbox. First move into the parent directory which contained the directory containing the eml files (in the example above this is Desktop) and then I ran:
     $ eml2mbox -s THEM ~/Desktop/WhateverYouWant.mbox
  4. Open and import the mail box as Other and you are done.
It sounds hard, but isn't once I worked out what I needed to do. The advantage for me is that I now have a simple and fast method to search for all my old email without cluttering up this year's email.
  • Currently 1.00 / 5
  • 1
  • 2
  • 3
  • 4
  • 5
  (1 vote cast)

Import thousands of old emails into | 4 comments | Create New Account
Click here to return to the 'Import thousands of old emails into' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
Import thousands of old emails into
Authored by: ibalbin on Apr 29, '04 07:41:16PM

I found my login id for macosXhints. Just identifying myself as the author ... in case.

[ Reply to This | # ]
Converting newlines
Authored by: clith on Apr 30, '04 11:45:57AM
You don't need a whole app to do that. Here's a command to do it in-place: perl -pi 's/\r/\n/g;' file1 [file2 ..]

Hope this helps..

[ Reply to This | # ]

how to use xargs correctly..
Authored by: clith on Apr 30, '04 11:52:59AM
Rather than piping "ls" to "xargs", you should use "find". Something like this:

find . -type f -print0 | xargs -0 perl -pi 's/\r/\n/g;'

Note the -print0 flag for xargs and the -0 flag for xargs. What does this do? It delimits the filenames with zero rather than space or newline. This is extremely useful for files that have spaces, newlines and what-have-you in their names.

To restrict the search to just the current directory, use the flag -maxdepth 1 argument for find:

find . -type f -maxdepth 1 -print0 | xargs -0 perl -pi 's/\r/\n/g;'

[ Reply to This | # ]

Problems with importing from Netscape or Mozilla or Thunderbird
Authored by: araque on Jun 12, '04 12:22:22PM

First off a big warning: BACKUP your original mailboxes before you do this.

The correct command is

find . -type f -print0 | xargs -0 perl -pi -e's/\n/\r/g'

I used this command to "fix" my Mozilla mailboxes (folders). For reasons I can't explain about 20% of the messages were affected by the newline / carriage return problem. I thought it had to do with the date, maybe the origin... I have switched from Netscape to Mozilla to Thunderbird in the past seven years, and all Mail clients have been backward compatible with the mbox format.

When I ran this command, the "bad" messages were "fixed", but the other 80% of the messages lost their "From" fields. So I had to import twice and then merge the results. This was a nightmare, if you want my opinion.

Why did I do it? Because the Search function in Mozilla / Thunderbird under Panther performs horribly (slow). In retrospect I would say Apple's Mail app should have a more robust and/or lenient Import function. The newline/carriage return problem is well documented, I don't know why they didn't catch this before. Hopefully the Tiger will fix this.

[ Reply to This | # ]