replacing '--' to faciliate translation (was Re: translation HowTo - is there such an animal)

Øyvind A. Holm sunny at sunbase.org
Tue Jul 19 11:48:25 CDT 2005


On 2005-07-18 09:41:30 Lorenz wrote:
> Lorenz wrote:
> > in src/nb/METHOD you write about working around problems with '--' 
> > within xml comments.
>
> I still can't find any mentioning of this problem in any xml source I 
> could find, but libxslt complains about it for sure.

Yes, xmllint(1) too. I suppose it’s because it’s legal to include "<" 
and ">" in the comments, so a bad combination of those would terminate 
the comment and mess up the XML.

> So we need a solution, though I would like to eliminate the script 
> solution (replacing/reinstating '--').
>
> I thought about replacing '--' by '––', and convince the 
> authors of the English version to do so too.

Heh… In fact, that has been suggested already. :)

  http://red-bean.com/pipermail/svnbook-dev/2005-June/000755.html

It could maybe work, but there are a couple of drawbacks with this 
method. Firstly, it would burden the main authors who have to remember 
to write those entities every time which would clutter the original 
English text, for example things like line 345 in src/en/book/ch04.xml. 
And if the English svnbook poets choose to comment out something in the 
middle of the text, those <!-- --> dashes would also have to be 
“entitised” this way. Either the comment would have to be removed, or 
the escaping would have to be done in the translated files only, leaving 
the comment in the English files intact. This could involve some manual 
work, or maybe result in still having the script solution around. Of 
course, this is only a problem if the comment is inside a paragraph, or 
you can avoid commenting out their comment.

Anyway, it would need to be something else than – as it is the 
same as the "–" (U+2013) character which would lead to syntactical 
errors. There could be defined a special double dash entity in the book, 
though.

> That would not eliminate the merging/compiling problems at once, at 
> least not for translations already running.
>
> Already running translations would have to catch up with this change 
> in the English version before they could eliminate the 
> replacing/reinstating steps.
> New translations could start with the according revision of the 
> English text, or handle the change on the first update/merge.
>
> What do you think?

Personally I don’t think this doubledash thing is any annoyance at all, 
because the conversion is fully automatic. Both operations — "make 
commitmode" and "make editmode" — takes 1.5 seconds each to run, and 
then the files are ready. Because the "ﳢ"/ﳢ character is unique, 
the Makefile can do whatever it needs to those characters, for example 
the "make sync" operation first automatically removes the characters 
before the merge, making the English commented-out blocks identical to 
the English files to avoid conflicts in those lines, and after the merge 
is done, the characters are automatically put into place, making the XML 
files valid again. The biggest problem with merging is if conflicts 
occur — r1362 and r1465, say no more. :) But revolutions like that are 
pretty rare and most conflicts are pretty easy to solve.

So in the end I believe replacing all the "--" in the English text would 
lead to more work than just leaving them in place, as all the 2511 
occurences of double dashes first would have to be replaced, and then 
the authors and translators would have to stick to these entities every 
time a "--" comes around.

Cheers,
Øyvind A. Holm
-- 
#!/bin/bash
for f in 1 2 3; do
  PREF=http://musthave.sunbase.org/Stallman/stallman${f}c
  wget $PREF.sub ; mplayer -cache 8192 -sub stallman${f}c.sub $PREF.mpeg
done



More information about the svnbook-dev mailing list