Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

Set pbcopy to use UTF-8 by default UNIX
After much frustration, I realized that the shell command pbcopy was destroying non-ASCII characters in an AppleScript that I was trying to write. Although pbcopy is an extremely convenient way to take the standard input and place it on the clipboard, I also needed access to accented characters.

Luckily, I found this blog post that explained everything. The important part was a comment there that explained how to set a default encoding by adding the following line to my .profile file:
export __CF_USER_TEXT_ENCODING=0x1F5:0x8000100:0x8000100
While this fixed the problem while working in the Terminal, it did not fix the problem within AppleScript. I did find that I could add the aforementioned export command inside the do shell script, and everything seemed to work as expected. An example is shown below.
do shell script "export __CF_USER_TEXT_ENCODING=0x1F5:0x8000100:0x8000100; cat -s " & quoted form of (fileToCopy) & " | pbcopy"
This has been working great for me, but one of the posts on the blog does claim a problem. The blog author also wrote a replacement for pbcopy and pbpaste that is also supposed to alleviate these issues. I have not yet tried it, however.
    •    
  • Currently 2.44 / 5
  • 1
  • 2
  • 3
  • 4
  • 5
  (9 votes cast)
 
[13,162 views]  

Set pbcopy to use UTF-8 by default | 14 comments | Create New Account
Click here to return to the 'Set pbcopy to use UTF-8 by default' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
Set pbcopy to use UTF-8 by default
Authored by: batmanppc on Jan 05, '09 08:12:54AM

Does this allow you to do what you need without adding it to the shell script?

In Terminal.app:
defaults write ~/.MacOSX/environment '__CF_USER_TEXT_ENCODING' '0x1F5:0x8000100:0x8000100'

You'll need to logout/login for it to take effect.

---
=====================================================================
Mohammad A. Haque
http://www.haque.net/
mhaque()haque.net



[ Reply to This | # ]
Set pbcopy to use UTF-8 by default
Authored by: edvakf on Jan 05, '09 08:21:58AM
The first argument "0x1F5" is actually your GID. It's the hexadecimal version of 'id -g'.
So if your GID is 502 in stead of 501, then you must make it "0x1F6".

http://listserv.dartmouth.edu/scripts/wa.exe?A2=ind0607&L=NISUS&D=0&P=54229

[ Reply to This | # ]
Set pbcopy to use UTF-8 by default
Authored by: batmanppc on Jan 08, '09 09:18:21AM

Actually, don't do this. Weird things happen as not everything likes UTF-8 and/or can't handle the transition.

---
=====================================================================
Mohammad A. Haque
http://www.haque.net/
mhaque()haque.net



[ Reply to This | # ]
Set pbcopy to use UTF-8 by default
Authored by: derrickbass on Jan 19, '09 10:34:13PM

Indeed, don't do this. I couldn't launch DVD Player after doing this.



[ Reply to This | # ]
Set pbcopy to use UTF-8 by default
Authored by: S on Jan 05, '09 08:15:58AM
Surely an even more convenient way would be set the clipboard to my_string?

[ Reply to This | # ]
Set pbcopy to use UTF-8 by default
Authored by: darick on Jan 05, '09 01:26:10PM

That may very well be true. In my case I was doing some additional processing through sed and it would have taken much more time to process within applescript. However, I still wanted to have an app in the end so applescript as a wrapper made sense.

In addition, the same problem occurs. You have to convince applescript to work with UTF-8. An example is included below.

set theFileReference to open for access fileToCopy
set theFileContents to read theFileReference as «class utf8»
close access theFileReference
set the clipboard to theFileContents

That also seems to work quite well. Is there any advantage to one method over the other? I just chose the way that made the most sense at the time.

[ Reply to This | # ]

Set pbcopy to use UTF-8 by default
Authored by: gshenaut on Jan 05, '09 10:01:35AM
I've never seen this problem and I use pbpaste/pbcopy all the time. One clue as to what's going on is this statement I found: "__CF_USER_TEXT_ENCODING [is] a representation of your default text encoding, which is determined by the first language in the Languages list in the International preference panel."
http://lists.apple.com/archives/applescript-users/2002/Aug/msg01523.html

On my system, I have English selected, and I assume that the original poster does as well, so I think that it may really have more to do with the fact that I have "US Extended" (which is Unicode, as opposed to "US", which is Roman) selected as my (only) Input Menu.

I personally recommend US Extended anyway, it is very convenient for typing languages other than English that use the basic Roman alphabet with diacritics. Even if that's not an issue for you, if you are going to use Unicode in the Terminal, then you should consider it anyway.

I'd be interested to hear if setting US Extended makes the problem of down-converted Unicode chars go away.

Cheers,
Greg Shenaut

[ Reply to This | # ]
Set pbcopy to use UTF-8 by default
Authored by: frank nospam on Jan 05, '09 10:34:03AM

Wow, I never looked that far down the languages list to see US Extended. Very good idea, thank you.



[ Reply to This | # ]
Set pbcopy to use UTF-8 by default
Authored by: darick on Jan 05, '09 01:17:05PM

As far as I can tell, changing from US to US extended on the input menu only changed the way the keyboard acts and didn't affect any of my file handling. I am now extremely confused; have you done anything else to your system to make it extra Unicode compatible?

However, changing the input menu is a great solution to some other problems--I wish I had know about US extended earlier.



[ Reply to This | # ]
Set pbcopy to use UTF-8 by default
Authored by: gshenaut on Jan 07, '09 08:19:09AM

The fact that my login shell is /bin/ksh might be a factor, I don't know. I can't think of anything else.



[ Reply to This | # ]
Set pbcopy to use UTF-8 by default
Authored by: Anonymous on Jan 07, '09 11:28:40AM

Just curious: what does ksh give you that bash doesn't?



[ Reply to This | # ]
Set pbcopy to use UTF-8 by default
Authored by: rdm on Jan 06, '09 10:04:42AM

Although I would agree that using Unicode throughout would be a good idea, I'm not sure most people are aware of the issues involved with 'switching'. That is, trying to use a Unicode input method on existing documents already started with non-Unicode encoding. [Documents can't switch all the 'other' text automatically, even if everything is displayed 'correctly'.]

Particularly if users have the 'Use one input source in all documents' checked in the Input Method selection Preference Pane, as this may force Unicode input in previously non-Unicode documents.

Until all internal text objects on the Mac are Unicode with simply different input/display methods, mixing text representations within a single document can be very problematic.

So just be cautious about switching to Unicode mid-stream, and be aware of potential issues.



[ Reply to This | # ]
Set pbcopy to use UTF-8 by default
Authored by: patashnik on Sep 14, '09 12:13:13PM

I ran into an interesting issue with Snow Leopard regarding this hint: applications started from Terminal with the "open" command wouldn't respond to the usual shortcuts (Cmd-Q -> Quit, etc). A cryptic message would appear in the log: "Failed to get
CharCodes from EventRef (-9870)".

I narrowed this down to the __CF_USER_TEXT_ENCODING environment variable, which I set according to this hint. When the variable is unset, the issues are gone.



[ Reply to This | # ]
Set pbcopy to use UTF-8 by default
Authored by: Lri on Sep 01, '12 11:38:54PM

You can also set LC_CTYPE to UTF-8.

do shell script "echo  | LC_CTYPE=UTF-8 pbcopy"


[ Reply to This | # ]