[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Arabic Unicode fonts



[Since this is my first post, I'll introduce myself. I'm a Debian maintainer
and something of an amatuer i18n expert. I'm on many i18n lists -
unicode at unicode dot org, i18n at xfree86 dot org, linux-utf8 at mail dot nl dot linux dot org dot  I
actually know little of Arabic; I'm just here as a general i18n know-it-all
and hopefully, to help the Arabization of Unix be done in such a way that
all the wheels don't have to rebuilt for the Mongolization of Unix or the
Zuluization of Unix.]

> I see the statement above as a major problem area (unless I'm completely
> misunderstanding something here).  The 0600-06FF arabic code-table is by
> all means NOT complete (there are no shaped letters -- only one form of
> 'seen' for example).

Unicode does not encode glyphs; it encodes characters. According to the
rules of Unicode, Arabic Presentation Forms A and Arabic Presentation Forms
B shouldn't be part of Unicode at all. (The main reason they exist is
simplify using Unicode on primitive systems.) Arabic Presentation Form A and
B shouldn't be used in files; use characters in the 0600-06FF block and the
application should take the responsibility for using glyphs from
Presentation Forms A & B if neccesary.

To fully support Unicode, a font format like OpenType is needed. An OpenType
font can take a characters, like U+062A, realize it's in the medial form,
and display the appropriate glyph, without needing a Unicode character.
Under Unix, OpenType is supported by FreeType 2. Since OpenType fonts are
currently almost impossible to make under Unix, what about BDF fonts? Arabic
Presentation Forms A & B is made for stuff like BDF fonts, and an argument
can be made that every Arabic BDF font should include them. (Or rather,
certain parts of those blocks; some of the ligatures in Presentation Forms A
are almost impossible to legiblly write in a small fixed character-cell.
Also several of those characters aren't used in the Arabic language.) The
GNU Unifont includes most of Arabic, and Arabic Presentation Forms-A & B.
(You can find a recent version at
http://people.debian.org/~dvdeug/unifont-dvdeug-1.0.tar.gz. The next
version, due in September, will have much improved Arabic glyphs.)

You had another question about how you were going to encode that many glyphs
in an 8-bit font. UTF-8 is irrelevant here. If you have XFree86 4.0, it
includes fixed fonts encoded in ISO10646-1, which is the encoding for
Unicode fonts under X. Hence you encode U+FE70 as 0xFE70 in the font. When
you use UTF-8 (Unix's normal encoding for Unicode), U+FE70 will be encoded
as 0xE08080, but it uses the U+FE70 to display under X.

--
David Starner - dstarner98 at aasaa dot ofe dot org
"The pig -- belongs -- to _all_ mankind!" - Invader Zim