Using UTF-8 (Unicode) in Gentoo

From Elvanör's Technical Wiki
Revision as of 08:55, 29 March 2007 by Elvanor (talk | contribs)
Jump to navigation Jump to search

Using UTF-8 in your Gentoo system is absolutely mandatory for many reasons... This short guide contains some links to the official Gentoo UTF-8 documentation and also discussions about some potential issues with UTF-8.

Most important stuff: Gentoo documentation, and setting your locale

There are several manipulations you need to do in order to have a full UTF-8 system. Basically, however, there are 5 steps:

  • Building UTF-8 support in your kernel;
  • Generating and using a UTF-8 locale (with a UTF-8 enabled glibc);
  • Add the "unicode" flag to your USE flags in /etc/make.conf.
  • Add Unicode support to the console (emerge an Unicode font and edit some configuration files).
  • Add a script to your default runlevel:
#!/sbin/runscript
conf=/etc/env.d/02locale

# Using devfs?
if [ -e /dev/.devfsd ] || [ -e /dev/.udev -a -d /dev/vc ]; then
  device=/dev/vc/
else
  device=/dev/tty
fi

depend() {
        need localmount
        after keymaps
        before consolefont
}

checkconfig() {

  if [ -r ${conf} ]; then
          . ${conf}
          encoding=
          [ -n "${LC_ALL}" ]      && encoding=${LC_ALL#*.}   && return 0
          [ -n "${LC_MESSAGES}" ] && encoding=${LC_MESSAGES#*. } && return 0
          [ -n "${LANG}" ]        && encoding=${LANG#*.}   && return 0
  fi
  eend 1 "Locale is not configured, Please fix ${conf}"
  return 1
}

start() {
        ebegin "Setting consoles to UTF-8"
        checkconfig
        if [[ "${encoding}" =~ "[uU][tT][fF]-?8" ]]; then
                dumpkeys | loadkeys --unicode
                for ((i=1; i <= "${RC_TTY_NUMBER}"; i++)); do
                        echo -ne "\033%G" > ${device}${i}
                done
                eend 0
        else
                eend 1 "UTF-8 is not required"
        fi
}


The two following links will explain all that in more details.

UTF-8, ISO 9660 and Joliet extensions

A current problem I have is that I did not manage to burn a CD/DVD with filenames in UTF-8. In K3b, checking the option "Generate Rock Ridge extensions" creates a working DVD under Linux (eg, the filenames appear correctly). However, under Windows and Mac OS X the same DVD does not work (filenames appear with garbage characters).

This is because Linux uses the Rock Ridge extensions, whereas Windows/OS X must use the poorly designed Joliet extensions. It seems that currently mkisofs (part of cdrtools) can not deal with an input-charset of UTF-8, or at least the stable version in Gentoo cannot. But I don't know if support could be added, or if it is impossible because of some Joliet limitations. Joliet seems to use UTF-16 encoding for filenames.

Anyway, the current situation is that I can have DVDs with UTF-8 filenames, but they work correctly only with Linux. Slightly annoying. Maybe with more recent versions of cdrtools or cdrkit (the Debian fork of cdrtools), this issue will go away.

Update, March 2007: with cdrkit and K3b 1.0, the problems go away. cdrkit produces perfectly readable CDs (at least for ISO-8859-1) under both Windows and Linux. Currently cdrkit is not the default in Portage (it should really be), so you must manually unmerge cdrtools and emerge cdrkit.

Unicode and LaTeX (tetex)

  • It seems hard to make unicode work correctly with LaTeX. There are two ways to use Unicode in LaTeX: \usepackage[utf8]{inputenc} and \usepackage[utf8x]{inputenc}. For utf8x, you must emerge the package latex-unicode. I think it is preferable to use the utf8 package.
  • After that, here is some LaTeX code showing how to setup utf-8 in a .tex file:
\usepackage[utf8]{inputenc}
\usepackage[russian,french]{babel}

\begin{document}

\selectlanguage{russian}

Here is some Russian: руский.

\selectlanguage{french}

Voilà du français.
  • The problem is that if you select the Russian language, while running pdflatex the resulting PDF will contain bitmap text (eg, text with not be rendered mathematically). I did not have this problem under Mac OS X. I am currently searching for a solution that would create beautiful PDFs with several languages used.
  • Update: Gentoo, as of March 2007, still uses the tetex TeX distribution. This distribution is outdated. The future of TeX on UNIX is apparently LiveTex. I hope that Gentoo makes the transition soon; in the mean time, some problems will be hard to solve with tetex which is unmaintained.