Website page character sets (was Re: Japanese Translation of SVNBook Top page)
maxb1 at ukf.net
Thu Feb 23 10:29:35 CST 2006
-----BEGIN PGP SIGNED MESSAGE-----
Grzegorz Adam Hankiewicz wrote:
> On 2006-02-22, "C. Michael Pilato" <cmpilato at red-bean.com> wrote:
>> index.en.html claims in a <META> tag that its contents are
>> iso-8859-1. But of course, that character set don' say nuttin'
>> 'bout no Unicode CJK characters. (This apparently hasn't prevented
>> the file from growing some Chinese Unicode characters already,
>> though.) At any rate, I think that UTF-8 should be the claimed
>> character set. Right?
&#nnnnn; entities are defined to be Unicode, regardless of the character
set used to interpret the raw bytes of the document.
> You are confusing file encoding (meta tag) with content (<html
> lang="xx">). Please don't mix them.
No, he is not. Language codes are not being discussed here, only
character sets are.
>> When I got to the index.it.html file, it didn't have the same
>> Chinese glyphs in it that the English page did, but instead
>> had the English word "Chinese". I'm guessing that's because
>> someone realized there than the Chinese glyphs wouldn't "fit"
>> in the iso-8859-1 page.
> No, it's because whoever added the link commited a terrible
> mistake. In fact, dmitry managed to break the spelling for the
> Spanish language in r1567, even if he didn't mean to.
Relax! Simply not realizing that there were translated versions
available to copy is not a 'terrible' mistake. Nor is a typo.
>> I'm going to switch the English page to UTF-8 now. I'd encourage
>> other translation owners to do the same for their sites so that all
>> the site pages have full access to the glyphs needed to describe
>> the other translation languages.
I believe that &#nnnnn; allows full access to all unicode glyphs, so the
change isn't necessary for that reason. However, I'd still recommend the
change, since it eases copy/paste between the files.
> And here a third different thing: a gliph is a term related
> to graphical output, which has nothing to do with HTML or its
> encoding. A glyph is something important only to a browser rendering
> a page. You can change HTML encoding as much as you want, the glyph
> will stay the same as long as you don't change the 7bit safe HTML
> escape sequence &#decimal used for say Russian or Japanese.
> I would rather encourage the respective translators to learn a
> little bit about web page internationalisation, and at least apply
> the following patch, which corrects the glaring mistakes (though
> it doesn't update the content of the outdated pages).
We're all friends here (I hope), please try to moderate your excessively
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Cygwin)
-----END PGP SIGNATURE-----
More information about the svnbook-dev