Using UTF-8 (Unicode) in Gentoo

From Elvanör's Technical Wiki
Revision as of 17:07, 11 December 2006 by Elvanor (talk | contribs)
Jump to navigation Jump to search

Using UTF-8 in your Gentoo system is absolutely mandatory for many reasons... This short guide contains some links to the official Gentoo UTF-8 documentation and also discussions about some potential issues with UTF-8.

Most important stuff: Gentoo documentation, and setting your locale

There are several manipulations you need to do in order to have a full UTF-8 system. Basically, however, there are 3 things to do:

  • Building UTF-8 support in your kernel;
  • Generating and using a UTF-8 locale (with a UTF-8 enabled glibc);
  • Add the "unicode" flag to your USE flags in /etc/make.conf.

The two following links will explain all that in more details.

UTF-8, ISO 9660 and Joliet extensions

A current problem I have is that I did not manage to burn a CD/DVD with filenames in UTF-8. In K3b, checking the option "Generate Rock Ridge extensions" creates a working DVD under Linux (eg, the filenames appear correctly). However, under Windows and Mac OS X the same DVD does not work (filenames appear with garbage characters).

This is because Linux uses the Rock Ridge extensions, whereas Windows/OS X must use the poorly designed Joliet extensions. It seems that currently mkisofs (part of cdrtools) can not deal with an input-charset of UTF-8, or at least the stable version in Gentoo cannot. But I don't know if support could be added, or if it is impossible because of some Joliet limitations. Joliet seems to use UTF-16 encoding for filenames.

Anyway, the current situation is that I can have DVDs with UTF-8 filenames, but they work correctly only with Linux. Slightly annoying.