[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Arabic font encodings
- To: general at arabeyes dot org
- Subject: Arabic font encodings
- From: Nadim Shaikli <shaikli at yahoo dot com>
- Date: Tue, 7 Aug 2001 16:25:59 -0700 (PDT)
Along with my adventures into font-land, I've accumulated these
semi-related questions,
1. Encoding - is there a standard list of how glyphs are encoded for both
ISO 8859-6 "arabic" fonts and forms-B ? I've figured that, for whatever
reason, the encoding can not exceed 2-bytes. ASCII utilizes 0x20-0x7F
(that's hex of course). Leaving us with (0x80-0xFF plus 0x00-0x1F),
roughly 160 encoding positions to muck with.
With my rough calculations looking at the code-tables at unicode,
"Arabic (0600..06FF)" requires 222 encodings (excluding empties)
"Forms-B (FE70..FEFE)" requires 140 encodings (excluding empties)
I think you know where I'm going with this -- so we have about 160
encodings to encode 362 characters ?? I take it the applications
will have to accept encodings that are more than 2-bytes
(ie. 0x000 - 0xFFF) ? Or am I missing something ?
If I'm not hallucinating, let me answer my own questions above and
note that this is exactly what UTF-8 is for, right ?
So looking into UTF-8 now in more details,
http://www.cl.cam.ac.uk/~mgk25/unicode.html
I see,
Unicode setting | UTF-8 encoding
---------------------------+-------------------------------
a. U-00000000 - U-0000007F: 0xxxxxxx
b. U-00000080 - U-000007FF: 110xxxxx 10xxxxxx
c. U-00000800 - U-0000FFFF: 1110xxxx 10xxxxxx 10xxxxxx
Which means like for (I'm just taking examples here), I can encode
glygh 0xFE70 (from Forms-B) to be say 0xE08080 (using line 'c' above).
So my main question is whether there is a standard that specifies what
these encodings are (what are the rules to how one encodes) - if
everyone were to take an arbitrary guess at what he/she thinks are good
encodings we'd endup with chaos (files won't be able to be shared) ??
I just haven't been able to find where the rules are spelled-out..
2. Windows, of course, uses a different code-table (or codepage). That
codepage is better known as "CP-1256". I've had a terrible time finding
CP-1256 fonts and encodings - could someone shed some light on this
for me in terms of links/doc/whatever... I don't need links to micro$oft;
I need links to where I can actually download bdf (since they are ascii)
fonts (pcf would be OK as well since I can convert them to bdf).
Again, keep in mind that I'm after what's STANDARD (if such a thing exists)
in terms of Arabic encodings maps.
Thanks..
- Nadim
__________________________________________________
Do You Yahoo!?
Make international calls for as low as $.04/minute with Yahoo! Messenger
http://phonecard.yahoo.com/