Website page character sets (was Re: Japanese Translation of SVNBook Top page)

Max Bowsher maxb1 at ukf.net
Thu Feb 23 10:29:35 CST 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Grzegorz Adam Hankiewicz wrote:
> On 2006-02-22, "C. Michael Pilato" <cmpilato at red-bean.com> wrote:
>> index.en.html claims in a <META> tag that its contents are
>> iso-8859-1.  But of course, that character set don' say nuttin'
>> 'bout no Unicode CJK characters.  (This apparently hasn't prevented
>> the file from growing some Chinese Unicode characters already,
>> though.)  At any rate, I think that UTF-8 should be the claimed
>> character set.  Right?

&#nnnnn; entities are defined to be Unicode, regardless of the character
set used to interpret the raw bytes of the document.

> You are confusing file encoding (meta tag) with content (<html
> lang="xx">). Please don't mix them.

No, he is not. Language codes are not being discussed here, only
character sets are.

>> When I got to the index.it.html file, it didn't have the same
>> Chinese glyphs in it that the English page did, but instead
>> had the English word "Chinese".  I'm guessing that's because
>> someone realized there than the Chinese glyphs wouldn't "fit"
>> in the iso-8859-1 page.
> 
> No, it's because whoever added the link commited a terrible
> mistake. In fact, dmitry managed to break the spelling for the
> Spanish language in r1567, even if he didn't mean to.

Relax! Simply not realizing that there were translated versions
available to copy is not a 'terrible' mistake. Nor is a typo.

>> I'm going to switch the English page to UTF-8 now.  I'd encourage
>> other translation owners to do the same for their sites so that all
>> the site pages have full access to the glyphs needed to describe
>> the other translation languages.

I believe that &#nnnnn; allows full access to all unicode glyphs, so the
change isn't necessary for that reason. However, I'd still recommend the
change, since it eases copy/paste between the files.

> And here a third different thing: a gliph is a term related
> to graphical output, which has nothing to do with HTML or its
> encoding. A glyph is something important only to a browser rendering
> a page. You can change HTML encoding as much as you want, the glyph
> will stay the same as long as you don't change the 7bit safe HTML
> escape sequence &#decimal used for say Russian or Japanese.
> 
> I would rather encourage the respective translators to learn a
> little bit about web page internationalisation, and at least apply
> the following patch, which corrects the glaring mistakes (though
> it doesn't update the content of the outdated pages).

We're all friends here (I hope), please try to moderate your excessively
confrontational stance.

Max.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Cygwin)

iD8DBQFD/eLvfFNSmcDyxYARAn6XAJ0SnxuG3yIH37NSkbIIXF3Z5Q8GOQCcDibT
vY6N/7lru700wk7ADvEgaZs=
=mid/
-----END PGP SIGNATURE-----




More information about the svnbook-dev mailing list