Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

Convert .doc files to PDF and retain 'structure' info Apps
I often convert big .doc files to PDF when I only need to read them, since they load much faster as PDFs. I was always annoyed that Word wouldn't transfer the document structure, with the headers etc., to the PDF's 'bookmarks' section. After a while, I came up with the solution...

Open the .doc or .docx file (I only tried .doc) with OpenOffice 3 and choose File » Export as PDF. This works like a charm, and I like the irony of OpenOffice being better than Microsoft at converting their very own closed format.
    •    
  • Currently 1.50 / 5
  You rated: 1 / 5 (10 votes cast)
 
[20,206 views]  

Convert .doc files to PDF and retain 'structure' info | 17 comments | Create New Account
Click here to return to the 'Convert .doc files to PDF and retain 'structure' info' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
Convert .doc files to PDF and retain 'structure' info
Authored by: fredlmoore on Jan 09, '09 07:35:12AM

NeoOffice has the same feature. One-click conversion. Use it all the time.



[ Reply to This | # ]
Convert .doc files to PDF and retain 'structure' info
Authored by: fracai on Jan 09, '09 08:00:43AM

To be fair, how much does Microsoft have to do with "Print to PDF"? Unless you're talking about exporting to PDF from within Word, this is likely more a limitation of the print system. After all, when printing you don't need to retain the document structure links.

It's been a while since I've upgraded or used Word so I may be outdated.

---
i am jack's amusing sig file



[ Reply to This | # ]
Convert .doc files to PDF and retain 'structure' info
Authored by: frgough on Jan 09, '09 08:36:17AM

Word 2008 has a "save as PDF" option, which does nothing more than simply print to PDF. Microsoft's PDF support has always stunk. On both platforms.



[ Reply to This | # ]
Convert .doc files to PDF and retain 'structure' info
Authored by: hamarkus on Jan 09, '09 08:39:27AM

Every app that can print to a PS printer can have its output converted to PDF on all computer platforms. In OS X, this functionality is build into the print dialog, installing Acrobat provides another engine for the conversion (also available via the print dialogue).
But printing does just hand over what will appear on each page to the printing subsystem, it does not hand over any metadata about the document structure. Therefore, creating a PDF with such metadata requires the application to cooperate in providing the data. OpenOffice and Neooffice use their own PDF engine and don't use the printing system for it. As such they can incorporate these metadata.
On Windows, Adobe wrote a plugin for Word (and PP) that has access to the metadata and thus can create a PDF containing them.

For the Mac version of Word, Adobe did not bother to write that plugin (because PDF creation itself is already easy) or MS did not make it easy/feasible to add plugins.



[ Reply to This | # ]
Convert .doc files to PDF and retain 'structure' info
Authored by: theilgaard on Jan 13, '09 10:52:36AM

It's not fully correct that Word need to speak directly with Distiller or vice versa.

The tags Distiller needs to build the document structure, can be put into the Postscript file for Distiller to process.

All old FrameMaker users know how this is done, as FrameMaker had this feature since the beginning of the pdf-format (more or less). At least it had the feature much earlier than Adobe build the plugin for Word on Windows (which btw actually exports the document to Postscript, and lets Distiller process this).



[ Reply to This | # ]
Convert .doc files to PDF and retain 'structure' info
Authored by: palahala on Jan 09, '09 09:00:29AM

Oddly enough, when using Safari, menu File, Print and then selecting PDF, does in fact create clickable links (no, not only for links that Preview can recognize due to the http:// or www prefix, but even clickable images are retained in the resulting PDF).

So either Safari somehow overrules the PDF option in the operating system's print dialog, or printing in Mac OS X can in fact include details about the structure...



[ Reply to This | # ]
Convert .doc files to PDF and retain 'structure' info
Authored by: Anonymous on Jan 09, '09 03:08:03PM

Pardon? So does Word. Please RTFA.



[ Reply to This | # ]
Convert .doc files to PDF and retain 'structure' info
Authored by: palahala on Jan 10, '09 03:48:21AM

A stands for Article? Please explain a bit more on how Word does this, as I cannot find it in the original post. Apart form that, I was replying to fracai. And both fracai and hamarkus seem to indicate that no structure is transferred through the "Print to PDF" function. In my understanding no structure also implies: no clickable links, except maybe for those links that the PDF reader can discover while displaying (like human readable internet links).

So: does Word on the Mac indeed create clickable table of contents, clickable indexes, clickable references to other pages, et cetera?

If so, then maybe Word has a similar preference as OpenOffice.org's "Export bookmarks as named destinations" which I described below.

As a side note: printing to PDF uses "Mac OS X 10.5.6 Quartz PDFContext". The word "Context" might suggest it gets more info than just the printout itself -- just like I experienced in Safari, and like you might have experienced in Word?



[ Reply to This | # ]
Convert .doc files to PDF and retain 'structure' info
Authored by: frgough on Jan 12, '09 09:20:20AM

No. Word 2008 does not include clickable cross-references in the PDFs it generates (clickable TOC and index enteries are simply Word-generated cross reference fields).



[ Reply to This | # ]
Convert .doc files to PDF and retain 'structure' info
Authored by: timmerk on Jan 09, '09 08:44:30AM

MS has a free Word 2007 plugin for exporting to PDF, and it retains the "structure/bookmarks/etc" information. I'm extremely disappointed that Word 2008 doesn't do this, nor does the new Pages 09. I've written Apple over and over asking for this feature.



[ Reply to This | # ]
Convert .doc files to PDF and retain 'structure' info
Authored by: palahala on Jan 09, '09 09:14:57AM

Given my experience with File, Print, PDF in Safari (see reply above): Pages might in fact also be able to create clickable references such as table of contents, indexes, external URLs, et cetera when clicking them on the PDF pages themselves...?

If such references are indeed clickable within the PDF pages themselves, then maybe you're only missing the table of contents like the one that often is shown in Preview's sidebar? (In Preview, one can switch between page thumbnails and the table of contents, but the availability of the latter depends on structure of the PDF document.) If so, then maybe "Export bookmarks as named destinations" (as it is named in the Export to PDF options in OpenOffice.org) is not implemented, or somehow needs to be enabled? When disabling that option in OpenOffice.org, one still gets clickable references, but Preview's sidebar no longer shows a list.

(Hmmm, hoping I made myself clear....)



[ Reply to This | # ]
Convert .doc files to PDF and retain 'structure' info
Authored by: frgough on Jan 12, '09 09:18:29AM

Are you sure about Pages 09? Pages 08 will preserve links and cross-references as part of the PDF when you export to PDF.



[ Reply to This | # ]
Convert .doc files to PDF and retain 'structure' info
Authored by: ipox on Jan 09, '09 01:26:11PM

Hi there... long time listener, first time caller.

MS Office 2007 (note: for Windows) has a plugin to save as PDF or XPS. It is freely downloadable from office.microsoft.com. Unfortunately, it requires you to install it before installing Service Pack 1, or it will "phail". I use 2007 in a Boot Camped, Parallelsed Windows because I was thoroughly unimpressed with 2008 for Mac.

In this plugin there are many options, including the ability to add full bookmarking in the PDF, and I am pleased with it and recommend it (recall above to install before SP1). The only drawback is that NUMBERED headings aren't faithfully recreated with their numbers (they show up as just the heading text). In contrast, my favourite tool for creating PDFs, Acrobat Distiller, does add the full numbering, but costs like $500 which is :( .

This is kind of a Windows tip, however. :shiftyeyes:



[ Reply to This | # ]
Another option: PDFClerk Pro
Authored by: Drdul on Jan 09, '09 06:31:51PM

The only downside of using OpenOffice/NeoOffice is that the formatting often gets changed/mangled when opening documents created in MS Office. Heck, I even have that problem opening documents in MS Office on another computer which is set to a different default printer. If I didn't have to exchange Office documents with others, I'd ditch MS Office completely, but that's another rant for another time. :-/

Anyway, my solution to this problem was to buy PDFClerk Pro, which can automatically add bookmarks to any PDF created from any app. There's a bit of a learning curve, but it is quite a powerful app once you get the hang of it.

---
Richard Drdul
Vancouver, BC



[ Reply to This | # ]
Convert .doc files to PDF and retain 'structure' info
Authored by: flyboybob on Jan 11, '09 09:47:35AM

Leopard's print function allows you to print anything to a PDF. Although NeoOffice has this feature to export as a PDF.



[ Reply to This | # ]
Safari seems an exception
Authored by: hamarkus on Jan 12, '09 08:03:20AM

The clickable elements in PDFs from Safari are new to me, apparently the PDF engine behind the 'Save as PDF' option is able to ingest some metadata (I would guess this is a rather recent functionality, ie, Leopard or Tiger).
But, I think it still pretty obvious that the application from which the PDF is saved has to provide the metadata and I also think that Apple is using private APIs here. In other words, only Apple might know how to embed these metadata and thus only its applications are able to use it.



[ Reply to This | # ]
Both Safari and OmniWeb create clickable links in PDFs
Authored by: palahala on Mar 05, '09 02:52:08AM

I just noticed that OmniWeb 5 (a free download since February 25, 2009) also creates clickable links in PDFs. And, like explained in earlier comments, this does not only make visible URLs clickable, like www.macosxhints.com might be made clickable by the PDF reader software, but any URL will be clickable.

Simply choose Print, click the button PDF, and then select Save as PDF.



[ Reply to This | # ]