Unix command files, UTF-8, and the byte order mark

Apr 06, '09 07:30:00AM

Contributed by: MJCube

"A little knowledge is a dangerous thing" as they say. A long story for a problem people may rarely if ever encounter, but here goes:

I love TextWrangler for editing all kinds of text files. I set it to save in UTF-8 (with the initial byte order mark, or BOM) set by default. I discovered that the BOM makes Safari read HTML as Unicode automatically, without the need for a charset declaration, or messy entity codes for special characters. So now I can just type HTML freely in any languages and scripts I want.

Now over to Terminal: On my old Mac, I had a few default aliases set up for tcsh. I learned that now in Leopard the default shell is bash, which I am happy to note supports Unicode in pathnames seamlessly, but which uses a very different structure for keeping default aliases. I found my old ~/Library » init » tcsh » aliases.mine file and did my research: I copied the file, saved it as ~/.bash_alias, and created ~/.bash_profile to source it.

But nothing would work. I got the strangest errors, like -bash: source: command not found. Say what?! The command is right there in /usr/bin/ where it belongs! I dug for answers on the net for hours, and kept trying things. Eventually I noticed that when I executed ~/.bash_alias myself on the command line, all but the first of my aliases loaded. When I changed the file to start with a blank line, all aliases loaded, with one error about an empty command. Ahha! So the problem turned out to be the file format: the BOM made the first word of the first line into nonsense. So I resaved both of my dot-files in "UTF-8, no BOM" mode, and all is well.

Moral of the story: Though we know "There ain't no such thing as plain text," Unix requires command files to be as close to it as possible.

Comments (17)


Mac OS X Hints
http://hints.macworld.com/article.php?story=20090403105318367