Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!


Click here to return to the 'How to retrieve text from Windows Office 2007 Word docs' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
How to retrieve text from Windows Office 2007 Word docs
Authored by: ctierney on Dec 08, '06 09:26:30AM
Thanks for the tip! Now I'll be prepared the next time I get one of these files. Here's another method that could be wrapped into an applescript droplet:
unzip -p some.docx word/document.xml | perl -pe 's/<[^>]+>|[^[:print:]]+//g'

[ Reply to This | # ]
How to retrieve text from Windows Office 2007 Word docs
Authored by: ctierney on Dec 08, '06 11:45:54AM
Here's a droplet that'll extract plain text to the clipboard:
on open this_item
   set docxPath to POSIX path of this_item
   try
      do shell script "unzip -p " & docxPath & " word/document.xml | perl -pe 's/<[^>]+>|[^[:print:]]+//g' | pbcopy"
   end try
end open

on run
   display dialog "Drop a docx file on this applescript and it's plain text contents will be copied to the clipboard." buttons {"Ok"} giving up after 10 default button 1
end run


[ Reply to This | # ]