Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

10.5: Use Automator to create audiobooks from PDFs System 10.5
I have recently undertaken some training in which I have to cover a massive amount of material. Being the natural procrastinator that I am, I immediately went on the prowl for better books, better methods, and all other sorts of periphery that doesn't actually count as studying.

One of these projects was to convert the text versions of my study guides to audio, so that I could listen to the text in the car while I drive (an ideal time to study). The process was actually surprisingly simple. Start by opening the PDF in Preview, then press Command-A (select all) and Command-C (copy). Open a new document in TextEdit and press Command-V (paste), then convert the document to plain text (Format » Make Plain Text). Save the file to a .txt document; for this example, we'll name it rawfile.txt.

At this stage, you might want to do some clean-up on the text. For instance, you might want to do some additional regular expression hacking to clean up the document for things like Footers, headers, or page numbers.

I use the little Perl Script (named convert.pl) listed below:
#!/opt/local/bin/perl
while (<>) {
      s/^\d+\/.*$//;  #Remove page numbers
      s/^\d*.$//;   
      s/Chapter \d+: [\s\w]+//;
      #Remove image and figure references
    s/^Figure \d+.*/;
      #Remove - continuations from end on lines.
      if (s/-.$//) {
              chomp;
      }
      print;
}
Save convert.pl to the file system, and make it executable with chmod u+x convert.pl. Now clean up the text document:
cat rawfile.txt | ./convert.pl > cleanfile.txt
Open the file cleanfile.txt in TextEdit (you might have to choose the UTF-8 type), then fire up Automator and create a Custom script with two actions:
  1. Text » Get Contents of TextEdit Document
  2. Music » Text to AudioFile
Select an appropriate target directory and filename -- I highly recommend choosing the voice Alex, which is the new 10.5-only high-quality voice. Make sure your cleanfile.txt document in TextEdit is the active document, then go back into Automator and hit Play. Now sit back and relax while Leopard converts your text to an Audio file. From there, you can import the audio file into iTunes, convert it to MP3 if you want, and sync it to your iPod to take the book on the road.

(The original version of this can be read in this posting on my blog.)
    •    
  • Currently 2.00 / 5
  You rated: 2 / 5 (9 votes cast)
 
[26,979 views]  

10.5: Use Automator to create audiobooks from PDFs | 6 comments | Create New Account
Click here to return to the '10.5: Use Automator to create audiobooks from PDFs' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
What about say command?
Authored by: zacht on Apr 30, '08 09:21:20AM
Is the "Text to AudioFile" significantly better than the command-line program say? For example:

say -v Alex -f cleanfile.txt -o mybook.aiff

(see this old hint: http://www.macosxhints.com/article.php?story=20031113181603909)

I am honestly asking so please don't get angry. I was just wondering if "Text to AudioFile" is better in some way, e.g., better handling of unfamiliar words or long files?

[ Reply to This | # ]

What about say command?
Authored by: stottm on Apr 30, '08 10:38:52AM

The say command is basically doing the same thing. The Alex voice is the same using both approaches. The quality is not going to get any better then Alex. You can pass additional parameters to slow Alex down a bit. Alex is the best text -> voice that I've seen. It's rather good but not yet perfected. It's an improvement over the voices in 10.4.



[ Reply to This | # ]
What about say command?
Authored by: osxpounder on Jan 30, '09 03:18:01PM

I just found another advantage: I can use the new hint in the open Mac labs at school, where Terminal isn't allowed (so I can't run say without using a script anyway, right?).



[ Reply to This | # ]
Pedagogical perspective
Authored by: SuperCrisp on Apr 30, '08 10:12:08AM

Great tip, but as a teacher, I just want to add one quick caveat: multitasking while attempting to learn doesn't work so well, at least that's what current research suggests. For a survey of information, listening while driving is fine, but you'll need to review afterward. For best results, do the review a couple of hours before a good sleep, even a one-hour nap will help. New research suggests that this is when info goes into long term memory. And if you do two learning tasks, you will tend to retain the last one. I'm sorry I'm not documenting this for you, but I really should be grading papers now and not procrastinating like this.



[ Reply to This | # ]
10.5: Use Automator to create audiobooks from PDFs
Authored by: chrischram on Apr 30, '08 09:52:05PM

FWIW, here is the bare skeleton of an Automator workflow that can do the whole conversion (minus the cleanup script, which someone else can figure out how to sandwich in there).

Get Speciļ¬ed Finder Items
Extract PDF Text
Open Finder Items (in TextEdit)
Get Contents of TextEdit Document
Text to Audio File
Import Audio Files
Add Songs to Playlist

The parameters of the several actions should be pretty straightforward (I hope) to figure out.



[ Reply to This | # ]
10.5: Use Automator to create audiobooks from PDFs
Authored by: ephramz on May 08, '08 10:36:31PM
I use the following script to batch convert every open document in TextEdit to speech, convert in iTunes to the current converter, and properly label the artist, album, track number, and lyrics (text of the document) using the first 3 lines of the text which are of the form:

Chapter #.
Title
Author

This could easily be modified to work with Preview on PDF documents. I tried doing this in Preview, but it was too difficult to get the looping to work right. This script has a lot of "try" blocks to avoid the crashing of the script from timeout errors when TextEdit doesn't respond for a long time while it's converting. Hope this is helpful.

set AppleScript's text item delimiters to " "
tell application "TextEdit"
	repeat with doc from 1 to count of documents
		set txt to text of document doc
		set chapter to word 2 of paragraph 1 of document doc
		set sname to ((words 1 thru -1) of ((paragraph 2 of document doc)))
		set author to ((words 1 thru -1) of ((paragraph 3 of document doc)))
		set pth to (path to music folder as text) & "chapter.aiff"
		log pth
		try
			say txt saving to pth
		on error
			set done to false
			repeat until done
				try
					do shell script "sleep 60"
					get name of front document
					set done to true
				end try
			end repeat
		end try
		tell application "iTunes"
			try
				set newtrack to item 1 of (convert pth)
			on error
				set done to false
				repeat until done
					try
						do shell script "sleep 60"
						get name of newtrack
						set done to true
					end try
				end repeat
			end try
			tell newtrack
				set artist to author as text
				set lyrics to txt as text
				set track number to chapter
				set name to sname as text
			end tell
		end tell
	end repeat
end tell


[ Reply to This | # ]