Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

Python and UTF-8 text encoding on OSX UNIX
I'm using a MacBook Air with Snow Leopard (10.6.2) as my development platform. And I'm using Python 2.6.4 with Unicode strings.

I received the following error when trying to run a script: UnicodeEncodeError: 'ascii' codec can't encode character u'xe9', indicating that the text encoding was wrong. So it couldn't output a word like appliqué correctly.

I tried adding # -*- coding: utf-8 -*- at the head of my Python script, but I still get this complaint.

To fix this, I found that the text encoding used for standard input, output, and standard error can be specified by setting the PYTHONIOENCODING environment variable before running the interpreter.

The value should be a string in the form <encoding> or <encoding>:<errorhandler>. The encoding part specifies the encoding's name, e.g. utf-8 or latin-1; the optional errorhandler part specifies what to do with characters that can't be handled by the encoding, and should be one of 'error', 'ignore', or 'replace'.

So typing export PYTHONIOENCODING=utf-8 prior to invoking the Python interpreter does the trick, or you could just add this setting to your environment file: ~/.MacOSX/environment.plist.

[crarko adds: I haven't tested this one.]
    •    
  • Currently 1.83 / 5
  You rated: 2 / 5 (23 votes cast)
 
[11,959 views]  

Python and UTF-8 text encoding on OSX | 4 comments | Create New Account
Click here to return to the 'Python and UTF-8 text encoding on OSX' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
Python and UTF-8 text encoding on OSX
Authored by: asmeurer on Jul 23, '10 10:56:36AM

Does the problem persist in Python 2.7? It sounds like a bug that might have been fixed.

Also, if you port your code to Python 3, all your unicode problems should disappear.



[ Reply to This | # ]
Python and UTF-8 text encoding on OSX
Authored by: JohnRoth1 on Jul 23, '10 04:50:35PM

The line at the top of the script only affects the encoding for unicode literals in the source - it has no effect whatever at run time.

This isn't a bug, and it isn't "fixed" in any release. The issue is that if you read something from a file and want to convert it to unicode, the Python run-time needs to know the encoding of the input, because there are literally dozens of possibilities. There are a lot of ways of doing that, from the poster's suggestion through a parameter on the open function and methods on the str and unicode objects. However, it does have to be done explicitly; the run-time takes the safe choice and defaults to 7-bit ASCII.

John Roth



[ Reply to This | # ]
Python and UTF-8 text encoding on OSX
Authored by: boxcarl on Jul 23, '10 11:42:55PM
As the other commenters have noted, in Python 2 you can print things of class str directly to the terminal (which are bytestrings) but things of class unicode need to be converted to bytestrings (str) before they can be printed. The easiest way to do this is myunicodestring.encode("utf-8") but if you want to be able to say “print myunicodestring” without encoding first in the interactive shell, you can try putting export LC_CTYPE=en_US.utf-8 in your .bash_profile, just so that Python knows that Terminal.app wants its input (Python’s output) to be in UTF-8. I think newer versions of Terminal do this automatically (10.6), but older ones did not.

[ Reply to This | # ]
Python and UTF-8 text encoding on OSX
Authored by: wolfy on Nov 15, '10 07:58:35PM

Thank you, thank you, thank you. I've been all over trying to make Python 3 do Unicode output to the stdio, and this was the missing piece.

---
Wolfy



[ Reply to This | # ]