How to retrieve text from Windows Office 2007 Word docs

Dec 08, '06 07:30:00AM

Contributed by: nicbav

Looking through makezine.com brings up a way to pull just the text from a new Word for Windows Office 2007 .docx file. This page has the info you need -- a simple PHP script that will pull the text from the file.

I think that maybe Openoffice.org 2.0 may be able to help, but I haven't tried it yet, so I would love to hear from anyone who has made this work.

[robg adds: On that page, several other solutions are mentioned. It should be noted that, as of now, all of them will strip the formatting from the file, providing just the text. Microsoft has promised free converters for older versions of Office on the Mac (an I'll list them here for easy reference for anyone searching:

With the recent news that the XML converters won't be out until April or so of next year for current versions of Office, I think tricks like this are going to be increasingly necessary. Hopefully some brilliant coder out there will figure out how to parse the XML before Microsoft does, as losing all formatting is far from ideal.]

Comments (9)


Mac OS X Hints
http://hints.macworld.com/article.php?story=20061206065508184