[PATCH] Support of PO files for translations

Jens Seidel jensseidel at users.sf.net
Thu Aug 21 06:46:24 CDT 2008


Hi,
 
as announced long time ago I worked on proper PO file support for handling
translations of the book. PO files are the standard way (see e.g.
http://www.debian.org/intl/l10n/po/) in the Open Source community to work on
translations and is used among others by Subversion.
 
Currently all translations are based on XML files. This has many
disadvantages:
 * Keep the files in sync with English ones is *very* hard
 * There is no standard way to specify the sync state against English files
   (needs to be extracted from svn log or maybe a comment, and is often wrong!)
 * There are from time to time build failures because tags where wrongly
   used in translations or not all XML files are in sync with the English ones
 * There are many long XML tags which need to be used in translations as well
 * If the translation is not yet fully updated after beeing 100% in the
   past old and new texts are mixed
 * Many (small) strings are repeated over and over again and need to be
   translated multiple times
 * It's not possible to add common translations for smaller phrases
   from an external message catalog
 * No way to get translation statistics

Nearly all these disadvantages are solved by using PO files.

There exists also really many tools to operate with these files
(e.g. the GNU gettext package contains a lot of programs called msg*,
graphical editors such as kbabel, gtranslator, ... and also plugins
for Emacs or vim (http://www.vim.org/scripts/script.php?script_id=695)
allow to jump to the next untranslated or fuzzy message, ...).

A PO file contains many pairs of strings of the following form:

# type: Content of: <book><title>
#: ../en/book/book.xml:25
msgid "Version Control with Subversion"
msgstr "Versionskontrolle mit Subversion"

msgid is the original English text, msgstr contains the translation.

The XML source contains: "<title>Version Control with Subversion</title>".
You see it's not necessary to let translators handle the tag <title>.

The package po4a (http://po4a.alioth.debian.org/) is used to convert between
PO and XML files. It's not a very common standard tool but is nevertheless
used for many non gettext based translations such as of manual pages, text
files, DocBook XML, ...

The DocBook support in po4a is not yet optimal. The main problem is
the bug http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=494607
which has a trivial patch. In the following I assume that this patch
is applied (can be done once the package is installed in the filesystem).

Without this patch applied, msgid strings do not contain full sentences
which is very bad for translations:

# type: Content of: <chapter><sect1><para><footnote><para>
#: ../en/book/ch09-reference.xml:21
msgid "Well, you don't need a subcommand to use the"
msgstr ""

# type: Content of: <chapter><sect1><para><footnote><para>
#: ../en/book/ch09-reference.xml:22
msgid "option, but we'll get to that in just a minute."
msgstr ""

With the patch applied a single larger string will be used.

# type: Content of: <chapter><sect1><para><footnote><para>
#: ../en/book/ch09-reference.xml:21
msgid ""
"Well, you don't need a subcommand to use the <option>--version</option> "
"option, but we'll get to that in just a minute."
msgstr ""


There is another minor problem with po4a: It is a little bit slow :-)
On the other side translating is much more time consuming than
po4a's updating of the PO file. Building output files from PO is
sufficient fast (2 min for German HTML).

I attached a patch which adds support for po4a to the Makefiles.
Please apply.

The following targets will be added by it:

xml-to-po: Creates an initial PO file from pre-existing XML files
update-po: Updates/creates a translation file in PO format.

The following works only if a PO file <LANG>.po is found in the directory
<LANG>/:
stats: Prints statistics, e.g.
       de.po: 429 translated messages, 451 fuzzy translations, 3306 untranslated messages.
clean: Removes generated *.xml files

Attention: Calling make clean will remove all XML files if a PO file
(even an empty one) exists!

Workflow:
There is no need to change individual existing Makefiles of translations.

* The PO file has to be created by extracting text/translations pairs from
  existing XML files. This is not always easy but needs to be done only once
  and only if there exist already translations. New translations are not
  affected. Make target: xml-to-po (see my other mail).
* Once the PO file exists it needs to be translated :-)
* From time to time the PO file needs to be synchronized with the English
  text via: update-po. That's a simple automatic step without user interaction
  (there could also be a cronjob somewhere which calls this and commits updated
  PO files).
* Creating XML from PO happens automatically via Makefile dependencies:
  e.g. make html

Attention: PO files are a "merge conflict attractor" :-)) Per default
a PO file contains comments referencing to the source code such as
#: ../en/book/ch02-basic-usage.xml:427

Once "make update-po" is called it is very likely that the source code
line number changes which may result in Subversion conflicts. It is
possible to remove these comments with a standard msg* command. Nevertheless
it is common to let the PO files contain source line comments and to
adapt the comments in the own modified PO file by calling:
$ msgmerge --update manually_revised_translation_with_old_references.po \
  an_arbritrary_PO_file_(e.g._from_SVN)_with_up-to-date_references.po


Supporting PO files does not mean that all teams have to switch. It is
optional. Nevertheless I strongly suggest it, especially for new
translations.

I tested it myself carefully with the German files. During it I noticed
also some problems in this translations (some files are not properly in
sync!). I will post instructions how to convert XML files to PO in
another mail where I will describe it for de/.

PS: The current (unchanged) Makefiles have a minor flaw: Calling
make html doesn't stabilizes. Only the third call does the same as the
first one (just try make html;make html;make html to see what I mean).

Jens
-------------- next part --------------
Index: src/tools/Makefile.base-vars
===================================================================
--- src/tools/Makefile.base-vars	(Revision 3279)
+++ src/tools/Makefile.base-vars	(Arbeitskopie)
@@ -56,6 +56,29 @@
 XML_SOURCE = $(DIR)/$(NAME).xml
 VERSION_SOURCE = $(DIR)/version.xml
 ALL_SOURCE = $(DIR)/*.xml
+
+# PO translation file specific variables
+POFILE := $(notdir $(CURDIR)).po
+
+# See man Locale::Po4a::Xml for further xml and docbook specific options.
+PO4AOPTIONS := --format=docbook --master-charset=utf-8 -o doctype="book"
+# See: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=494607
+#	-o nodefault="<option>" -o inline="<option>"
+#	-o untranslated="<command>" -o inline="<command>"
+
+# English source files to be used with PO files
+ALL_EN_SOURCE = $(wildcard $(addprefix ../en/,$(ALL_SOURCE)))
+# Files to create from PO file
+ALL_NON_TRIVIAL_EN_SOURCE = $(filter-out ../en/book/version.xml,$(ALL_EN_SOURCE))
+
+# A list of xml files to create: book/foreword.xml ...
+# (ALL_SOURCE matches only already existing book/*.xml files and cannot be used!)
+ALL_XML_TARGETS_FROM_PO = $(patsubst ../en/%,%,$(ALL_NON_TRIVIAL_EN_SOURCE))
+
+ALL_SOURCE_PO_MODE = $(patsubst ../en/%,%,$(ALL_EN_SOURCE))
+
+# expands to nothing if no PO file exists
+CREATED_XML_SOURCE_FROM_PO = $(patsubst %,$(ALL_XML_TARGETS_FROM_PO),$(wildcard $(POFILE)))
 STYLESHEET = $(DIR)/styles.css
 INSTALL_SUBDIR = $(INSTALL_DIR)/$(NAME)
 
Index: src/tools/Makefile.base-rules
===================================================================
--- src/tools/Makefile.base-rules	(Revision 3279)
+++ src/tools/Makefile.base-rules	(Arbeitskopie)
@@ -24,6 +24,39 @@
 	  mv $(VERSION_SOURCE).tmp $(VERSION_SOURCE); \
 	fi
 
+# Update/create a translation file in PO format.
+update-po: $(ALL_NON_TRIVIAL_EN_SOURCE)
+	po4a-updatepo --po=$(POFILE) $(PO4AOPTIONS) $(addprefix --master=,$(ALL_NON_TRIVIAL_EN_SOURCE)) --previous
+
+# Create PO file from xml files
+# This needs to be called only the first time when PO files are introduced.
+xml-to-po:
+	po4a-gettextize $(PO4AOPTIONS) --localized-charset=utf-8 \
+	$(addprefix --master=,$(ALL_NON_TRIVIAL_EN_SOURCE)) \
+	$(addprefix --localized=,$(ALL_XML_TARGETS_FROM_PO)) | \
+	msgattrib --clear-fuzzy > $(POFILE)
+
+# Some rules only apply if a PO file exists (as it cannot be otherwise a
+# dependency, ...)
+ifeq ($(wildcard $(POFILE)),$(POFILE))
+
+# Prints statistics for the current translation
+stats:
+	@echo -n "$(POFILE): " 1>&2; \
+		po4a-gettextize $(PO4AOPTIONS) $(addprefix --master=,$(ALL_NON_TRIVIAL_EN_SOURCE)) \
+		| sed -e "s/charset=CHARSET/charset=ascii/" \
+		| msgmerge --quiet $(POFILE) - \
+		| msgfmt --output-file=/dev/null --statistics -
+
+# Create xml files (only a single file can be created per time)
+$(ALL_XML_TARGETS_FROM_PO): %.xml: ../en/%.xml $(POFILE) $(wildcard addenda/*.add)
+	po4a-translate --po=$(POFILE) --keep=0 $(PO4AOPTIONS) --master=../en/$*.xml --localized=$*.xml
+
+# we need to redefine ALL_SOURCE as *.xml expands to "*.xml" if no xml files exist
+ALL_SOURCE := $(ALL_SOURCE_PO_MODE)
+
+endif # PO file exists
+
 html: valid $(HTML_TARGET)
 $(HTML_TARGET): $(ALL_SOURCE) $(VERSION_SOURCE) $(STYLESHEET) $(IMAGES)
 	$(ENSURE_XSL)
@@ -102,12 +135,12 @@
 
 # Clean targets
 clean:
-	rm -f $(VERSION_SOURCE) $(HTML_TARGET)
+	rm -f $(VERSION_SOURCE) $(HTML_TARGET) $(CREATED_XML_SOURCE_FROM_PO)
 	rm -f $(HTML_ARCH_TARGET) $(HTML_CHUNK_ARCH_TARGET)
 	rm -f $(FO_TARGET) $(PDF_TARGET) $(PS_TARGET)
 	rm -rf $(HTML_CHUNK_DIR)
 
 # Utility targets
-valid: $(VERSION_SOURCE)
+valid: $(ALL_SOURCE) $(VERSION_SOURCE)
 	$(XMLLINT) --noout --nonet --valid $(XML_SOURCE)
 


More information about the svnbook-dev mailing list