Apr 30, '08 07:30:00AM • Contributed by: stephanbuys
One of these projects was to convert the text versions of my study guides to audio, so that I could listen to the text in the car while I drive (an ideal time to study). The process was actually surprisingly simple. Start by opening the PDF in Preview, then press Command-A (select all) and Command-C (copy). Open a new document in TextEdit and press Command-V (paste), then convert the document to plain text (Format » Make Plain Text). Save the file to a .txt document; for this example, we'll name it rawfile.txt.
At this stage, you might want to do some clean-up on the text. For instance, you might want to do some additional regular expression hacking to clean up the document for things like Footers, headers, or page numbers.
I use the little Perl Script (named convert.pl) listed below:
#!/opt/local/bin/perl
while (<>) {
s/^\d+\/.*$//; #Remove page numbers
s/^\d*.$//;
s/Chapter \d+: [\s\w]+//;
#Remove image and figure references
s/^Figure \d+.*/;
#Remove - continuations from end on lines.
if (s/-.$//) {
chomp;
}
print;
}
Save convert.pl to the file system, and make it executable with chmod u+x convert.pl. Now clean up the text document:
cat rawfile.txt | ./convert.pl > cleanfile.txt
Open the file cleanfile.txt in TextEdit (you might have to choose the UTF-8 type), then fire up Automator and create a Custom script with two actions:
- Text » Get Contents of TextEdit Document
- Music » Text to AudioFile
(The original version of this can be read in this posting on my blog.)
