Python, wxPython and internationalization (i18n)

Author:	Pierre Rouleau
Version:	1.1
Date:	2003-10-15

The goal of this document is to describe how to internationalize a Python application that uses wxPython for the User Interface.

Contents

1 Background
2 How to create gettext catalog files
3 How to write an internationalized wxPython application
4 Python and wxPython modules, classes and functions
5 Controlling the presentation language from the environment variables
6 Tools to manage translation dictionary files
7 Creating .mo files with the mki18n.py script

1 Background

1.1 I18N under Python and wxPython

wxPython and Python support the gettext system for I18N (internationalization ¹).

1.2 GNU gettext system

Internationalization of software is supported by the GNU Translation Project. Although the project focuses on the goal of translating software user interfaces in as many natural languages as possible and does not impose a tool set, it recommends the use of gettext (and you should use gettext with Python and wxPython).

1.2.1 Idea behind gettext

The general idea behind gettext is that you write your source code in English and all the natural language strings are also written in English. The strings are inserted inside the source code, you do not use string resource identifiers. But all strings that must be translatable are surrounded a small macro call: _(). The gettext system supports a large set of programming languages including C, C++ and Python.

In C or C++ _() is a special macro that calls on gettext function `` macro is replaced by the preprocessor (in C or C++) by a call to a gettext function that will search for the proper string at run time using the original English string as the key to the language dictionary that is currently active. If the active language is English, no translation is performed, if the active language is something else the translation is performed if a matching string is found. In Python the _() function is either mapped explicitly to gettext.gettext() by your application or installed by the gettext class API.

The dictionaries are compiled files (files with the .mo extension ² ). Now, to create the .mo files, you first parse all of your source code with the gettext tools and they generate a .pot file (.POT stands for Portable Object Template) which is a simply formatted text file that contains all of the English strings that must be translated. Each English string acts as the message identifier for that string. Below each English string is a spot for the translated version of that string. You copy the .pot file into a .po ³ and give it to a human translator. Then you compile the translated .po file into a .mo file that you place inside one of the LC_MESSAGE directories of your system. These directories are named after the natural language they refer to. The natural languages supported are the ISO 639 language codes. These codes are a set of two-character language codes.

Applications normally place the various .mo files inside language target specific sub-directories of the directory ./local . The following directory tree show a directory tree for English (en), French (fr) and Spanish (sp) would look like.

./locale/en/LC_MESSAGES
./locale/fr/LC_MESSAGES
./locale/sp/LC_MESSAGES

The .mo files for each language is stored inside the LC_MESSAGES sub-directory under the language code directory. For example, the .mo dictionary file is located inside ./locale/sp/LC_MESSAGES.

1.2.2 gettext tools

The GNU gettext tools are available for all OS supported by GNU. The following tools are console tools for the Win32 platform. The GNU tool package include the console programs listed in the following table.

Program	Description
gettext.exe	Display native language translation of a textual message.
iconv.exe
msgattrib.exe	Filters the messages of a translation catalog according to their attributes, and manipulates the attributes.
msgcat.exe	Concatenates and merges the specified PO files.
msgcmp.exe	Compare two Uniforum style .po files to check that both contain the same set of msgid strings.
msgcomm.exe	Find messages which are common to two or more of the specified PO files.
msgconv.exe	Converts a translation catalog to a different character encoding.
msgen.exe	Creates an English translation catalog. The input file is the last created English PO file, or a PO Template file (generally created by xgettext). Untranslated entries are assigned a translation that is identical to the msgid.
msgexec.exe	Applies a command to all translations of a translation catalog. The COMMAND can be any program that reads a translation from standard input. It is invoked once for each translation. Its output becomes msgexec's output. msgexec's return code is the maximum return code across all invocations.
msgfilter.exe	Applies a filter to all translations of a translation catalog.
msgfmt.exe	Generate binary message catalog from textual translation description.
msggrep.exe	Extracts all messages of a translation catalog that match a given pattern or belong to some given source files.
msginit.exe	Creates a new PO file, initializing the meta information with values from the user's environment.
msgmerge.exe	Merges two Uniforum style .po files together.
msgunfmt.exe	Convert binary message catalog to Uniforum style .po file.
msguniq.exe	Unifies duplicate translations in a translation catalog.
ngettext.exe	Display native language translation of a textual message whose grammatical form depends on a number.
xgettext.exe	Extract translatable strings from given input files.

1.2.2.1 How to get gettext tools for Win32

To install the GNU gettext on your Win32 system, follow the instructions:

download the following files:
Create a directory called \gnu in one of your system drive (let's say in D:)
Extract the 3 ZIP files inside the \gnu directory. This way the executable files of the 3 packages will be stored inside \gnu\bin.
Add \gnu\bin to your PATH.

The files listed above were taken from the following sites:

GNU FTP site for gettetx where several versions (0.10.40, 0.11.2, 0.11.5 and 0.12.1) are available.

The GNU libiconv ftp site. This is the ftp site for the GNU libiconv library. The iconv library is required by gettext. Get version 1.9.1 or later.

There are other packages maintained by other individuals. I recommend you use the one above. However, the following sites helped me getting started.

SourceForge page of gettext for Win32

GNU gettext for WIN32 is a little distribution of the GNU gettext for Win32

Some cautionary notes:

You should never use a version of gettext older than 0.10.39 (because it produces .po files that cannot be used by the Translation Project without human editing). Version 0.11 is considered stable according to the Translation Project gettext is a set of command line tools and code libraries that have been developed under the GNU umbrella.

1.2.3 gettext file formats

To be written.

1.3 Python gettext

The Python gettext module provides internationalization (I18N) and localization (L10N) services for Python modules and applications. It is based on the GNU gettext system.

2 How to create gettext catalog files

To help creating the gettext binary catalog files, I wrote a Python console program called mki18n.py tha uses the GNU gettext utilities to parse Python source code and create the .po and .mo files. The mki18n.py is used to perform several tasks:

parse all Python source code files of an application and create the .pot file for each target language.

parse all Python source code files of an application and merge the generated .pot with existing .po (files that already contain text translated by human translator).

create the binary .mo files from the .po files.

The mki18n.py module can also be used as an imported module inside other Python programs.

3 How to write an internationalized wxPython application

The application must contain the following code:

the main module must import the gettext module.
the global code of the main module must call gettext.install() to install the function _() inside the application dictionary. This call identifies the translation domain ⁴ (which normally is the application name), the location of the LC_MESSAGES directories and whether Unicode is used. The following is an example of the call where the dictionary files are under ./locale/xx/LC_MESSAGES (with xx being the language code) and testApp being the name of the application (and the name of the .mo file):
```
gettext.install('testApp', './manual', unicode=False)
```
For each supported presentation language, the program must create a gettext. Translation instance by calling gettext.translation(). The arguments to the translation function are the application domain (the name of the .mo file which is often the name of the application), the directory parent of the languages LC_MESSAGES directories and the list of language. For example, if your application is called ivcm and the files are stored under a directory called locales, the support for English, French and Spanish would be set up by doing the following calls:
```
self.presLan_en = gettext.translation("ivcm", "./locale", languages=['en'])
self.presLan_fr = gettext.translation("ivcm", "./locale", languages=['fr'])
self.presLan_sp = gettext.translation("ivcm", "./locale", languages=['sp'])
```
To activate a the translation of your application for a specified language, the language must be activated by calling the Translation.install() method. For example, to activate the French presentation, you would call:
```
self.presLan_fr.install()
```
Set the wxWindows locale by calling wxLocale() and passing the wxLANGUAGE_XX code corresponding to the selected language. The following code snippet shows you how:
```
self.locale = wx.Locale(wxLANGUAGE_FRENCH)
locale.setlocale(locale.LC_ALL, 'FR')
```
In your code, all strings must be enclosed in a _() call like this:
```
aTitle = _("Testing internationalization")
```

The gettext.install(domain, localedir, unicode) call instructs the gettext system to look for the dictionary file name built from the components:

localedir/language/LC_MESSAGES/domain.mo

4 Python and wxPython modules, classes and functions

Python gettext
wxPython wx.Locate class
wxPython wx.GetTranslation()
Windows wxLocale

5 Controlling the presentation language from the environment variables

The following environment variables control the selection of the translation language. The system uses the language code found in the first environment variable found from the following list:

LANGUAGE
LC_ALL
LC_MESSAGES
LANG

6 Tools to manage translation dictionary files

I normally use the CRiSP editor to edit .po files and compare several versions of the .po files. There are, however, specialized tools that simplify managing gettext catalog files. These tools are listed here.

poEdit is a cross-platform gettext catalogs (.po files) editor.

7 Creating .mo files with the mki18n.py script

The mki18n.py script helps you create .po and .mo files from your source code files for an application. I describe the process of internationalizing the ivcm application here.

All strings that must be internationalized inside ivcm.py (and its companion files) have the form _("Hello"). All strings inside the source code must be in English as this is the convention used by the gettext system.

To use my mki18n.py script, I write a file called app.fil that contains the names of all files inside the application (one file per line, with full or relative path. For example:

images.py
ivcm.py
ivcm_about.py
ivcm_ie.py
ivcm_usermanual.py
ivcm_wxFrame1.py
../ptz.py
../action.py
../utprint.py

Then I run mki18n -p from the directory where ivcm.py is located to parse all source files and create a 'messages.pot' file. The .pot is the original template. You keep this file untouched. If I want to support French then I copy the messages.pot into a .po file named after the domain name (in this case the application name: 'ivcm') and the target language code (in this case: 'fr'). So for French I use the file name: ivcm_fr.po. If I need to support Spanish, I copy messages.pot into ivcm_sp.po and so on.

The following lines show a couple of entries inside the non translated ivcm_fr.po:

#: ivcm.py:168
#, python-format
msgid ""
"\n"
"   ERROR: %s"
msgstr ""

#: ivcm_wxFrame1.py:638 ivcm_wxFrame1.py:1742
msgid "&About..."
msgstr ""

The next step is to perform the translation. You can use a normal editor to append the French string inside the ivcm_fr.po or use poEdit or any other .po editor. The result of the translation would look like:

#: ivcm.py:168
#, python-format
msgid ""
"\n"
"   ERROR: %s"
msgstr ""
"\n"
"   ERREUR: %s"

#: ivcm_wxFrame1.py:638 ivcm_wxFrame1.py:1742
msgid "&About..."
msgstr "À propos de iVCM..."

Note that every line with a '#' in the first column is a comment or flag. In the example above the python-format flag shows that the strings were extracted from Python source. The #: lines show the line number of the original source.

Some of the flags are set when you re-synchronyze the translations with the source. This resynchronization is required if the source changes after you have created the translated .po file(s).

My mki18n.py script will automatically perform syncronisation if it finds .po files that have the domain_language.po name layout. After a re-synchronization, 'mki18n -p' creates a .new file for every .po file found. In my example, it would create a ivcm_fr.po.new and a ivcm_sp.po.new

If the source has not changed, the .new files are equal to the .po file. Otherwise, the .new file contains the new strings to translate, place the straing that were removed from the source as comments inside the .po.new file and may also flag some strings as 'fuzzy'. A fuzzy flag indicates that the translation of the original source should probably change because the original string changed. So, I compare the .po and .po.new, and edit whatever is requiered, leaving the finished work inside the .po file.

The final step is to compile the finished .po file into the .mo file.

The .mo file normally reside inside the LC_MESSAGES of a 'locale' sub-directory with a xx/LC_MESSAGES for each supported language:

du locale
         3 locale/en/LC_MESSAGES
      8803 locale/en
       172 locale/fr/LC_MESSAGES
      7997 locale/fr
         4 locale/sp/LC_MESSAGES
        54 locale/sp
     20413 locale

I create the .mo file using mki18n -m which uses the msgfmt.exe tool that comes with the GNU gettext utility.

And then the application uses the .mo file!

[1]	The word internationalization is often abbreviated as I18N: take the first letter of the word (i) followed by the number of letters in the word (18) and terminate it with the last letter of the word (n)

[2]	MO stands for Machine Object file.

[3]	PO stands for Portable Object file.

[4]	The translation domain corresponds to the .mo dictionary file that is searched by the _() gettext translation function. It is selected by gettext.install().