Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

Print Unicode characters to the Terminal with Java UNIX
Anyone with experience in Java knows that the way to print a string to a console is like this:
System.out.println("Hello, world.");
However, this method does not work well with non-ASCII characters. The following is some test code to demonstrate this:
import java.io.PrintStream;

public class Test {
  public static void main (String[] argv) {
    String unicodeMessage =
    "\u7686\u3055\u3093\u3001\u3053\u3093\u306b\u3061\u306f";

    PrintStream out = System.out;
    out.println(unicodeMessage);
  }
}
When compiled and executed, it looks like this:
iboook:~/Documents/codes tokek$ java Test 
?????????


By looking at this, one might be misled to think that Unicode is not supported at the console for OS X. Well, it turns out that System.out prints Unicode strings using the MacRoman charset. One of the problems with this is that Japanese characters cannot be expressed with MacRoman, and the console uses UTF-8, not MacRoman. In order to surmount this problem, we will create a new PrintStream object that uses UTF-8.
import java.io.PrintStream;
import java.io.UnsupportedEncodingException;

public class Test {
  public static void main (String[] argv) throws UnsupportedEncodingException {
    String unicodeMessage =
    "\u7686\u3055\u3093\u3001\u3053\u3093\u306b\u3061\u306f";

    PrintStream out = new PrintStream(System.out, true, "UTF-8");
    out.println(unicodeMessage);
  }
}
(The PrintStream constructor is the reason why we need a throws UnsupportedEncodingException.)

The corrected code has the proper output of:
皆さん、こんにちは
Here, the constructor options mean:
  1. Use System.out as the underlying OutputStream
  2. Flush OutputStream after each print, println, and write call
  3. Write UTF-8 byte sequences to the underlying OutputStream. If the character encoding is not specified, a PrintStream constrctor will use MacRoman on OS X as the object's character encoding for print and println methods.
While the print method in PrintStream works like a Writer, it also can be used like an OutputStream with the write methods by inheritance. The write methods output byte(s) directly to the underlying OutputStream. On top of the Writer-OutputStream hybrid nature of PrintStream, PrintStream exists also as a convenience for printing String representations of primitive variable values, as well as supressing pesky IOException errors.

To clarify the Java 1.4.2 API for this class, it should be pointed out that the print methods don't always use the "platform's default character encoding" because a different one can be specified in the constructor's arguments -- it uses whatever encoding that is associated with the PrintStream object.
    •    
  • Currently 3.36 / 5
  You rated: 4 / 5 (11 votes cast)
 
[68,910 views]  

Print Unicode characters to the Terminal with Java | 6 comments | Create New Account
Click here to return to the 'Print Unicode characters to the Terminal with Java' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
Print Unicode characters to the Terminal with Java
Authored by: adrianm on Feb 11, '05 01:50:10PM
Alternatively, just run the original program like this:

java -Dfile.encoding=UTF8 Test
and it'll work fine.

Substitute UTF8 with whatever is appropriate for your current terminal/needs.

[ Reply to This | # ]

Print Unicode characters to the Terminal with Java
Authored by: gidds on Feb 11, '05 08:56:18PM

Yep, just what I was going to say.

Actually, there are several related issues here. (I've been through them coz I keep all my files in CP1252, aka Windows Latin-1.)

The main one is the encoding that the JVM uses for file I/O; as you say, you set this using the -D option to set the 'file.encoding' system property when running Java programs. This will affect all character-based -- FileReaders, FileWriters, &c -- unless they specify an alternate encoding (which isn't a good idea in general). Not FileInputStream &c, though, as they're byte-based.

Second is the encoding in your source code; you set this in the same way with the -D option when compiling. This will affect extended characters in string literals &c.

Third is the encoding used by whatever you edit files with. TextEdit can set this in its preferences, for example. Most command-line tools will use the environment variable LC_CTYPE, but there's a problem with 'vi' (at least in 10.2.8) which ignores it. (I've compiled my own version which sets it properly, by calling setlocale(LC_CTYPE, "") at the start.) If no locale is defined for the encoding you want, it's not hard to define your own using mklocale.

And fourth is the encoding used by the terminal itself. Terminal can set this in its Display preferences; iTerm in its Terminal Profiles. (I have a suspicion that extended characters used to cause occasional crashes in Terminal, but that may not apply to recent versions.)

If any of these differ, it'll cause problems at some point... But if they're all set up correctly, it'll handle extended characters just like 7-bit ones, and you'll never need to worry about the difference!

---
Andy/



[ Reply to This | # ]
Print Unicode characters to the Terminal with Java
Authored by: pjt33 on Feb 16, '05 03:24:20PM
Bad idea. Sun state that file.encoding is supposed to be read-only, so that gives non-portable code. In particular, it doesn't work with Sun's Linux VM.

[ Reply to This | # ]
Print Unicode characters to the Terminal with Java
Authored by: dc_rees on Aug 25, '05 05:22:30AM

Well, Sun is just plain wrong on this one.

Apple's java is not quite right either, perhaps. Java should get it's environment from the terminal when invoked from the terminal.

But wrapping your readers and writers to force them to read and write shift-JIS is putting that sort of functionality in the wrong place. When the terminal is not quite correctly interfaced with Java is exactly the time you don't want to hard code the work-around.



---
Say yes to CPU multiculture



[ Reply to This | # ]
Print Unicode characters to the Terminal with Java
Authored by: cultureulterior on Apr 05, '05 07:57:57AM
How to do this in jython:

import sys
from java.lang import System
from java.io import PrintStream
System.setOut(PrintStream(System.out, 1, "UTF-8"));
class encoder:
    def write(self, text):
        System.out.write(text.encode("utf-8", "replace"))
sys.stdout = encoder()


[ Reply to This | # ]
Problems printing Unicode characters to the Terminal with Java
Authored by: gustavobap on Nov 16, '05 06:21:27AM

It didn`t work with me (Window 2000), the stream printed this:
皆ã�•ã‚"ã€�ã�"ã‚"ã�«ã�¡ã�¯
I need to print the UNICODE characters, someone knows how to do this with Windows ?



[ Reply to This | # ]