Convert formatted text to valid HTML using TextEdit

Aug 30, '06 07:30:05AM

Contributed by: llscotts

I just discovered, to my great relief, that TextEdit can convert rich text constructed using the native Cocoa text, font, and style features (including lists and tables) to well-formed HTML by selecting the proper setting in the Open and Save tab of TextEdit's Preferecnes window. This preference has been available since at least 10.4.6, but I don't know how long before that.

Any of you who've struggled with converting Word documents to HTML over the years know what a pain it has been. Word insists on inserting invalid -- or simply overly heavy-handed -- CSS styles in order to produce HTML that matches the look and feel of the original Word document, and to my knowledge, it provides no way to bypass this. Not to pick on Microsoft unduly, as Apple takes the same approach with Pages, which converts its beautifully-formatted documents to HTML using CSS styles so verbose and convoluted (yet so WYSIWYG accurate) that no self-respecting webmaster would ever want to claim ownership of the code, much less actually post it on a server. :-)

Through some extremely difficult maneuvers, it's possible to convert a Pages (or Word) file to HTML, open it in TextEdit, and save it two or three times in order to cleanse the file of its nonstandard and genuinely ugly underlying code ... ending up with an HTML file clean enough to actually work with. But I wouldn't want to do that on a regular basis!

Instead, what I discovered is that if you work in a native Cocoa application like TextEdit using only the tools Apple provides for word processing (which admittedly take some getting used to, and handle only basic formatting needs -- much like basic HTML itself), you can easily work in a WYSIWYG mode and then convert the file to clean HTML that you won't be embarrassed to call your own. (For any geeks among you who'd like to learn more about the Cocoa text system, here's a link to get you started.)

Yes, there are many native HTML editors for the Mac that can do this as well -- which don't likewise introduce extraneous code -- but I was delighted to find I could basically develop HTML in any native Cocoa app as well! For example, I currently do a lot of data entry in DevonThink Pro, which -- like SohoNotes, Journaler, Yojimbo, Curio, VoodooPad, and many others -- enables word processing through the native Cocoa toolset. If you do the same, you'll find that you can build tables, lists, and any other text you like in such an application and then, if you need to convert it to HTML, simply copy and paste it into TextEdit. You don't need to export the file to RTF or HTML or whatever from the application in question.

Until yesterday, I thought TextEdit's HTML conversion ability was on a par with that of Word and Pages. That's probably because in its default mode, it is. However, unlike those apps, the surprisingly powerful TextEdit provides some very handy, simple options to produce clean HTML when you need that. Here's a brief set of steps to take advantage of this capability:

  1. Copy and paste your Cocoa-formatted text into a new TextEdit document. (Hint: TextEdit provides an Application Service (New Window Containing Selection) in the Services menu for this once you select the text in the originating app.)
  2. Open TextEdit's Preferences and select the Open and Save tab.
  3. Change Document Type to either HTML 4.01 Strict or XHTML 1.0 Strict, depending on whether you want your code to be XHTML compliant or not.
  4. Change Styling to No CSS. Note that this will strip all font and style information from the file, except for the basics like bold and italics.
  5. From the TextEdit menubar, select File/Save As.
  6. In the Save As dialog box, give your file a name and hard disk location. Then, change the File Format selection to HTML, and click Save.
Now, when you click on your new HTML file in the Finder, it will open with your default web browser. If you examine the source code, you'll see nothing but simple, pure HTML (or XHTML). The only 'bad' thing I noticed was that the Cocoa HTML Writer that does the conversion still uses <b> for boldface rather than the 'correct' <strong>. But that's easy enough to fix.

If you're like me, you can now take the HTML code and plop it into your blog post or any other standard HTML file (which probably already has its own CSS styles defined), and it will add nothing but pure content to that file. This is going to be a real time-saver for me, since it'll let me format lists and tables in any Cocoa app and not have to worry about how I'm going to convert the data to HTML later on!

Comments (17)


Mac OS X Hints
http://hints.macworld.com/article.php?story=20060828093624972