Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

Click here to return to the '10.4: Batch text conversion with textutil' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
10.4: Batch text conversion with textutil
Authored by: GlowingApple on Mar 14, '06 06:54:37AM

Great hint on a great utility. I never knew this command existed. This could easily be used to convert a folder of documents for viewing on an iPod. Now to find a util to pull text out of a pdf file...

Jayson --When Microsoft asks you, "Where do you want to go today?" tell them "Apple."

[ Reply to This | # ]
Ghostscript can convert PDF to text
Authored by: TrumpetPower! on Mar 14, '06 09:34:29AM

Ghostscript can convert PDF files to plain text, though you might not be terribly happy with the results. That's not Ghostscript's fault, though--it depends entirely on the nature of the particular PDF in question. For example, if the text was converted to paths before being outputted as PDF, you won't get anything. Often, kerning is done by starting a new block of text at that point, which can r esul t in w eir d gap s in t he t e x t. And so on.

Your best bet may be the full version of Acrobat (not the reader), since it includes OCR and other niceties. But, unless the PDFs were specifically created in a manner to keep the text machine- as well as human-readable (for speakable text, for example), don't plan on it being a fully-automated process.



[ Reply to This | # ]
10.4: Batch text conversion with textutil
Authored by: johnga1t on Mar 14, '06 09:58:02AM
you can try ps2ascii (it works on ps and pdf files) if you have any tetex or latex packages installed (you can get them from fink or from ii2). as mentioned above, you may see some funny spacing, but it's better than nothing.

[ Reply to This | # ]
10.4: Batch text conversion with textutil
Authored by: fds on Mar 14, '06 12:39:08PM

textutil actually used to be able to convert from pdf files in the original release of Tiger. However, somewhere around the 10.4.4 update, this feature was taken away. I had to revert to pdftotext from Xpdf:

[ Reply to This | # ]