Extract names and emails from a text file

Feb 20, '13 07:30:00AM

Contributed by: robg

I have a recurring need to extract full names and email addresses from a plaintext archive of email messages. The archive is created by selecting a bunch of emails in Mail, copying them, pasting into TextEdit, and converting to plain text.

For each message in the file, the first line contains the information I wanted:

From: Joe Example <joe@example.com>
I wanted one email address per line, suitable for pasting into another location. I am far from an expert with the bash shell, but here's what I came up with—I imagine there are many more efficient ways to do this, as I'm sure experienced perl, sed, awk, etc. users may point out. Note that this is highly dependent on the format created by Apple's Mail app in OS X 10.8.

grep 'From:' /path/to/archive.txt | cut -f2 -d\< | cut -f1 -d\> | pbcopy

The grep bit pulls out the entire From: line, then the first cut command grabs the email address and the trailing close-bracket, by setting the delimiter to an open bracket. The second cut eliminates the closing bracket, by setting that as the delimiter. The output will be one email address per line, sitting on your clipboard ready for pasting. (To debug, just remove the | pbcopy bit to see the output.)

I also wanted to extract the names, and came up with a variant to do just that:

grep 'From:' ~/Desktop/testfile.txt | sed -e 's/: /:^/g' | sed -e 's/ \</^\</g' | cut -f2 -d^ | pbcopy

This one is messier, as names can contain one or more spaces. After getting the From: line, sed is used (twice) to add a carat delimiter immediately after From:, and immediately before the opening bracket of the email address. I then used cut, with the delimiter changed to the carat, to extract the full name (field two) from the found lines. Again, the results are copied to the clipboard; leave this bit off for debugging.

With the names and addresses extracted, it's fairly easy to do other stuff with them. In my case, I'm reading them into a couple of array variables in a bash script, so I can then output a name and email address pair to consecutive locations on my multi-pasteboard. If you want to use the names in an array in a bash script, you'll want to change the array delimiter from a space to a newline:

IFS='
'

Without this, your array will get split anywhere there's a space in the name values ... or so I've heard, not that it's ever happened to me!

Comments (12)


Mac OS X Hints
http://hints.macworld.com/article.php?story=20130219161025495