Using UTF-8 (Unicode) in Gentoo

From Elvanör's Technical Wiki
Revision as of 12:30, 6 March 2007 by Elvanor (talk | contribs)
Jump to navigation Jump to search

Using UTF-8 in your Gentoo system is absolutely mandatory for many reasons... This short guide contains some links to the official Gentoo UTF-8 documentation and also discussions about some potential issues with UTF-8.

Most important stuff: Gentoo documentation, and setting your locale

There are several manipulations you need to do in order to have a full UTF-8 system. Basically, however, there are 3 things to do:

  • Building UTF-8 support in your kernel;
  • Generating and using a UTF-8 locale (with a UTF-8 enabled glibc);
  • Add the "unicode" flag to your USE flags in /etc/make.conf.

The two following links will explain all that in more details.

UTF-8, ISO 9660 and Joliet extensions

A current problem I have is that I did not manage to burn a CD/DVD with filenames in UTF-8. In K3b, checking the option "Generate Rock Ridge extensions" creates a working DVD under Linux (eg, the filenames appear correctly). However, under Windows and Mac OS X the same DVD does not work (filenames appear with garbage characters).

This is because Linux uses the Rock Ridge extensions, whereas Windows/OS X must use the poorly designed Joliet extensions. It seems that currently mkisofs (part of cdrtools) can not deal with an input-charset of UTF-8, or at least the stable version in Gentoo cannot. But I don't know if support could be added, or if it is impossible because of some Joliet limitations. Joliet seems to use UTF-16 encoding for filenames.

Anyway, the current situation is that I can have DVDs with UTF-8 filenames, but they work correctly only with Linux. Slightly annoying. Maybe with more recent versions of cdrtools or cdrkit (the Debian fork of cdrtools), this issue will go away.

Unicode and LaTeX (tetex)

  • It seems hard to make unicode work correctly with LaTeX. There are two ways to use Unicode in LaTeX: \usepackage[utf8]{inputenc} and \usepackage[utf8x]{inputenc}. For utf8x, you must emerge the package latex-unicode.
  • After that, here is some LaTeX code showing how to setup utf-8 in a .tex file:
\usepackage[utf8]{inputenc}
\usepackage[russian,french]{babel}

\begin{document}

\selectlanguage{russian}

Here is some Russian: руский.

\selectlanguage{french}

Voilà du français.
  • The problem is that if you select the Russian language, while running pdflatex the resulting PDF will contain bitmap text (eg, text with not be rendered mathematically). I did not have this problem under Mac OS X. I am currently searching for a solution that would create beautiful PDFs with several languages used.