Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

Convert formatted text to valid HTML using TextEdit Apps
I just discovered, to my great relief, that TextEdit can convert rich text constructed using the native Cocoa text, font, and style features (including lists and tables) to well-formed HTML by selecting the proper setting in the Open and Save tab of TextEdit's Preferecnes window. This preference has been available since at least 10.4.6, but I don't know how long before that.

Any of you who've struggled with converting Word documents to HTML over the years know what a pain it has been. Word insists on inserting invalid -- or simply overly heavy-handed -- CSS styles in order to produce HTML that matches the look and feel of the original Word document, and to my knowledge, it provides no way to bypass this. Not to pick on Microsoft unduly, as Apple takes the same approach with Pages, which converts its beautifully-formatted documents to HTML using CSS styles so verbose and convoluted (yet so WYSIWYG accurate) that no self-respecting webmaster would ever want to claim ownership of the code, much less actually post it on a server. :-)

Through some extremely difficult maneuvers, it's possible to convert a Pages (or Word) file to HTML, open it in TextEdit, and save it two or three times in order to cleanse the file of its nonstandard and genuinely ugly underlying code ... ending up with an HTML file clean enough to actually work with. But I wouldn't want to do that on a regular basis!

Instead, what I discovered is that if you work in a native Cocoa application like TextEdit using only the tools Apple provides for word processing (which admittedly take some getting used to, and handle only basic formatting needs -- much like basic HTML itself), you can easily work in a WYSIWYG mode and then convert the file to clean HTML that you won't be embarrassed to call your own. (For any geeks among you who'd like to learn more about the Cocoa text system, here's a link to get you started.)

Yes, there are many native HTML editors for the Mac that can do this as well -- which don't likewise introduce extraneous code -- but I was delighted to find I could basically develop HTML in any native Cocoa app as well! For example, I currently do a lot of data entry in DevonThink Pro, which -- like SohoNotes, Journaler, Yojimbo, Curio, VoodooPad, and many others -- enables word processing through the native Cocoa toolset. If you do the same, you'll find that you can build tables, lists, and any other text you like in such an application and then, if you need to convert it to HTML, simply copy and paste it into TextEdit. You don't need to export the file to RTF or HTML or whatever from the application in question.

Until yesterday, I thought TextEdit's HTML conversion ability was on a par with that of Word and Pages. That's probably because in its default mode, it is. However, unlike those apps, the surprisingly powerful TextEdit provides some very handy, simple options to produce clean HTML when you need that. Here's a brief set of steps to take advantage of this capability:
  1. Copy and paste your Cocoa-formatted text into a new TextEdit document. (Hint: TextEdit provides an Application Service (New Window Containing Selection) in the Services menu for this once you select the text in the originating app.)
  2. Open TextEdit's Preferences and select the Open and Save tab.
  3. Change Document Type to either HTML 4.01 Strict or XHTML 1.0 Strict, depending on whether you want your code to be XHTML compliant or not.
  4. Change Styling to No CSS. Note that this will strip all font and style information from the file, except for the basics like bold and italics.
  5. From the TextEdit menubar, select File/Save As.
  6. In the Save As dialog box, give your file a name and hard disk location. Then, change the File Format selection to HTML, and click Save.
Now, when you click on your new HTML file in the Finder, it will open with your default web browser. If you examine the source code, you'll see nothing but simple, pure HTML (or XHTML). The only 'bad' thing I noticed was that the Cocoa HTML Writer that does the conversion still uses <b> for boldface rather than the 'correct' <strong>. But that's easy enough to fix.

If you're like me, you can now take the HTML code and plop it into your blog post or any other standard HTML file (which probably already has its own CSS styles defined), and it will add nothing but pure content to that file. This is going to be a real time-saver for me, since it'll let me format lists and tables in any Cocoa app and not have to worry about how I'm going to convert the data to HTML later on!
    •    
  • Currently 4.31 / 5
  • 1
  • 2
  • 3
  • 4
  • 5
  (13 votes cast)
 
[109,887 views]  

Convert formatted text to valid HTML using TextEdit | 17 comments | Create New Account
Click here to return to the 'Convert formatted text to valid HTML using TextEdit' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
Mandatory Obnoxious Comment on Semantic Markup
Authored by: cbiagini on Aug 30, '06 08:24:46AM

In a tool that generates HTML automatically, using <b> is actually preferable to <strong>. Boldface is used in all kinds of situations, not just when you want to strongly emphasize text, and the tool can’t guess what you mean. It’s better to use a nonsemantic tag, rather than use a semantic tag improperly. See MPT’s “When semantic markup goes bad”.

I’m sorry. I just couldn’t resist :)



[ Reply to This | # ]
Mandatory Obnoxious Comment on Semantic Markup
Authored by: brycesutherland on Aug 30, '06 08:48:40AM

If you weren't going to be obnoxious, I was about to. Still, great hint -- very informative and thorough!



[ Reply to This | # ]
Convert formatted text to valid HTML using TextEdit
Authored by: Mike Perry on Aug 30, '06 08:46:06AM
Sigh, this is a classic problem with all too many programmers, or at least those in the paid, corporate world. (I'm looking at you Microsoft.) Give them a simple problem, and they'll make it more complicated to create a challenge and add job security.

llscots is right. Quite often we don't want to move the WYSIWYG formatting to another document, we just want to move HTML or character/paragraph styles along with the text. I don't know how many times I've tried to drive home to developers the point that we want to leave fonts and other "how it looks" issues in the hands of the IMporting application. Ideally, the EXporting application shouldn't even include them. I almost had a book go to print with some weird, brief passages in Times Roman (the virus font) that Word didn't strip out when it exported rtf and that InDesign didn't strip out when it imported rtf.

Earlier this week I evalutated Mellel, a lightweight but powerful word processor that makes very effective use of styles. I gave up getting it when I discovered that Mellel's rtf export strips out Mellel's styles and just created raw, highly formatted text. And that's a small company that I talked with over and over about the need to export the styles they're so proud of inside their application. And yes, it can also export in XML now, but importing XML into InDesign is poorly documented and needlessly complex. All I want are character and paragraph style tags (which could also be HTML tags). They could hire probably hire a bright 12-year-old who could code that.

And that's the problem. It's too simple and straight-forward. It's much more fun to muck about with all sorts of complex coding to recreate the "look and feel."

What we need is a text editor that simply tags text, tagging both paragraphs and sections of text (i.e. with italic). On export it writes those tags out in a form other applications understand, HTML for the web, RTF for Word, MIF for Framemaker, IDIF for InDesign and so forth. For simply transfering style names, that's a trivial task. InDesign's interchange format for paragraph style names is almost identical to HTMLs. Then when we've imported that styled text, it's easy to give meaning to the styles. This application could also be smart enough to change styles names between import and export. Heading 1 in Word/RTF on import, could become H1 for HTML on export. That'd let us interchange documents in HTML, Word, InDesign, Framemaker or whatever without having to cut out a lot of useless formatting clutter.

[ Reply to This | # ]

Convert formatted text to valid HTML using TextEdit
Authored by: ddauerbach on Aug 30, '06 10:34:16AM

Mike Perry's wants sound exactly like the word processor I use. The bad news is that I use it in a virtual DOS machine. It's called XyWrite.

I did just update my Mellel and I'm curious to see how it will do exporting for web purposes. Of course, Dreamweaver does a pretty good job cleaning up.



[ Reply to This | # ]
Convert formatted text to valid HTML using TextEdit
Authored by: AJB on Aug 30, '06 02:01:52PM

The "correct" tag for boldface is <b> not <strong>. <strong> does NOT mean boldface, although many browsers may choose to render it as such. If you want to bold the text use <b>. If you want a strong emphasis (which may be rendered as boldface) use <strong>.



[ Reply to This | # ]
Convert HTML to formatted text using TextEdit
Authored by: dborod on Aug 30, '06 02:28:42PM

If you open an HTML file in TextEdit, you get to edit the HTML.

If you open an HTML file in TextEdit using the open dialog box and unselect the "Ingnore Rich Text Commands" checkbox, the HTML is rendered in TextEdit and you get to edit the HTML as formatted text.



[ Reply to This | # ]
Not To Emphasis a Point Irrelevant To the Hint, But...
Authored by: llscotts on Aug 30, '06 03:34:02PM

There seems to be some disagreement over the assertion I made about <STRONG> being "correct". Just so you know I didn't make that up, check out the w3c accessibility guidelines on this subject. I'm sure that my sensitivity and technical training in web accessibility issues is where I got the impression that <STRONG> is preferable to <B> these days.

In any case, it's irrelevant to the hint, though obviously someone at Apple should decide if adhering to web accessibility standards is the more important factor in deciding how to translate a boldface font tag. I don't honestly have a strong opinion myself and can see both sides of the debate. I thought the issue had been decided by w3c's position, but perhaps it's still subject to debate.

Cheers,
Leland

---
Anything great you do today can always be improved upon tomorrow.

[ Reply to This | # ]

Convert formatted text to valid HTML using TextEdit
Authored by: etresoft on Aug 30, '06 07:42:18PM

I got excited when I read this hint at work and couldn't wait until I got home to try it. I do some HTML in French and it is always a hassle to type in the Unicode characters. I though this would finally do it for me. Alas, no such luck. All I want is a text editor that will convert é to "&eacute;". Yes, I already have Dreamweaver and I hate it.



[ Reply to This | # ]
Convert formatted text to valid HTML using TextEdit
Authored by: jmmermet on Aug 31, '06 12:53:01AM
I am French too. I use Unicodechecker at http://earthlingsoft.net and its services to easily translate diacritics like àù... into their HTML Entities. It's free and easy !

[ Reply to This | # ]
Convert formatted text to valid HTML using TextEdit
Authored by: HiramNL on Aug 31, '06 03:34:53AM

There is no need for that anymore. Simply choose UTF-8 as the encoding in TextEdit's Open and Save preferences, and your , your ï and your ø (and all the other diacriticals) will display correctly in any modern browser.



[ Reply to This | # ]
Convert formatted text to valid HTML using TextEdit
Authored by: etresoft on Aug 31, '06 04:21:37PM

HiramNL,
Now that's a good hint! Plus, it is HTML. I can just use TextWrangler. I don't need TextEdit.



[ Reply to This | # ]
Convert formatted text to valid HTML using TextEdit
Authored by: koncept on Aug 31, '06 03:07:06PM
Try TextMate. From the HTML Bundles collections, you can convert an entire document or selected text to entities.

[ Reply to This | # ]
Convert formatted text to valid HTML using TextEdit
Authored by: HandyMac on Sep 01, '06 04:55:11PM
Speaking of TextEdit -- which is my most used app after Safari, and does a lot of clever tricks (for instance, paste a URL into TextEdit, select it, control-click and you can make it into a clickable live link, very useful in the myriad help documents I create for clients) -- it's not widely known that it's what amounts to open-source; the source code comes with XCode Tools (in /Developer/Examples/AppKit/TextEdit) with, I gather, permission to do whatever you like with it. One neat example of what can be done is iText Express, which adds a few neat features (page numbers, columns, header/footer, footnotes, adjustable margins, etc.) to make TextEdit into a pretty good slim & fast word processor -- sorta like WriteNow for OS X (except its file format is non-propretary RTF, just like TextEdit).

[ Reply to This | # ]
Convert formatted text to valid HTML using TextEdit
Authored by: Creative-i on Oct 18, '07 01:29:19AM
Hmmm I've been reading through this trying to find an easy Text to HTML conversion programme such as the one I used under OS9 and no luck! I used a small utility called Cyrk Text Converter which is fantastic but doesn't run under OSX and I can no longer run the Classic environment (guess why).

Basically all I want to do is convert the URL http://domain/filename to the standard link <a href="http://domain/filename">; Filename </a> and ditto for mailto: which is what Cyrk did but I search in vain for an OSX equivalent.

Cyrk also added simple classes such class="www" which I could do a global search and replace in Dreamweaver to convert to target="_blank" class="www". Obviously it also inserts the HTML head, body tags etc giving me a simple unadorned HTML page which I could open in DW and format accordingly (font, size et al).

Anybody got any suggestions?

[ Reply to This | # ]
Convert formatted text to valid HTML using TextEdit
Authored by: alec kinnear on Nov 25, '07 06:13:56AM
According to handymac above, TextEdit will do the conversion you want.

I tested it and indeed it works.

  1. control click the URL text and turn it into a live link
  2. export document as html (after changing html preferences)
bingo you have live links.

albeit one at a time.

---
WordPress SEO Secrets - foliovision.com/weblog

[ Reply to This | # ]

Convert formatted text to valid HTML using TextEdit
Authored by: Creative-i on Oct 05, '09 09:21:07AM
I tried your offer but no go, when you Control-click on a valid HTML link eg, http://globalcomment.com/2009/mr-demint-goes-to-honduras/ in TextEdit the link does not get converted. Instead I get a bunch of option to add or take away styles of one kind or another. I have all the prefs set correctly to save as HTML (actually XHTML). Another writer asserted that if I save as HTML from Textedit I would find a valid HTML doc at the other end. No! Worse, Textedit adds in para breaks as well as the existing BRs, so I end up with double-spaced docs. Why is such a simple thing beyond all programmers (except the guy who wrote Cyrk)? Bill

[ Reply to This | # ]
Convert formatted text to valid HTML using TextEdit
Authored by: craig_scratchley on May 21, '10 02:11:45PM

"Easy Text To HTML Converter " is a Windows Solution (it seems to be useful for me). I'd rather not have to save a file, etc. to make a conversion when all I want to do is paste into a webpage.

http://www.easyhtools.com/ethdescription.html



[ Reply to This | # ]