10.5: Use Automator to create audiobooks from PDFs

Apr 30, '08 07:30:00AM

Contributed by: stephanbuys

I have recently undertaken some training in which I have to cover a massive amount of material. Being the natural procrastinator that I am, I immediately went on the prowl for better books, better methods, and all other sorts of periphery that doesn't actually count as studying.

One of these projects was to convert the text versions of my study guides to audio, so that I could listen to the text in the car while I drive (an ideal time to study). The process was actually surprisingly simple. Start by opening the PDF in Preview, then press Command-A (select all) and Command-C (copy). Open a new document in TextEdit and press Command-V (paste), then convert the document to plain text (Format » Make Plain Text). Save the file to a .txt document; for this example, we'll name it rawfile.txt.

At this stage, you might want to do some clean-up on the text. For instance, you might want to do some additional regular expression hacking to clean up the document for things like Footers, headers, or page numbers.

I use the little Perl Script (named convert.pl) listed below:

#!/opt/local/bin/perl
while (<>) {
      s/^\d+\/.*$//;  #Remove page numbers
      s/^\d*.$//;   
      s/Chapter \d+: [\s\w]+//;
      #Remove image and figure references
    s/^Figure \d+.*/;
      #Remove - continuations from end on lines.
      if (s/-.$//) {
              chomp;
      }
      print;
}
Save convert.pl to the file system, and make it executable with chmod u+x convert.pl. Now clean up the text document:
cat rawfile.txt | ./convert.pl > cleanfile.txt
Open the file cleanfile.txt in TextEdit (you might have to choose the UTF-8 type), then fire up Automator and create a Custom script with two actions:
  1. Text » Get Contents of TextEdit Document
  2. Music » Text to AudioFile
Select an appropriate target directory and filename -- I highly recommend choosing the voice Alex, which is the new 10.5-only high-quality voice. Make sure your cleanfile.txt document in TextEdit is the active document, then go back into Automator and hit Play. Now sit back and relax while Leopard converts your text to an Audio file. From there, you can import the audio file into iTunes, convert it to MP3 if you want, and sync it to your iPod to take the book on the road.

(The original version of this can be read in this posting on my blog.)

Comments (6)


Mac OS X Hints
http://hints.macworld.com/article.php?story=20080427091554310