[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: arabic ISO characters



Salam,
I'm answering just one point as I have a great load of work, so I'll write
more later when I have more time,

Nadim Shaikli wrote:

>  >The point is, I've created a very cool (I hope:)) Arabic font for the
>  >text console that has position independant shapes. It is delivered with 
>  >Akka, and should exempt anyone working on Arabization for the moment
>  >from taking care of the contextual formatting.
>
> You see, this I don't understand -- I don't think what you describe
> is very portable in the sense that no one will be able to comprehend what you
> might write using Akka unless they have those same fonts (which if you came
> up with them -- is very unlikely).  In other words, I would think that the
> more generalized a font is, the better since that would note that other 
> people using whatever other tools and as long as the bytes are stored in a 
> particular manner (your Bidi spec. :-) they would be able to share documents.

(snip)

Nadim, it seems you are confusing fonts and encodings.

Put simply, an encoding is the set of codes that represent each characters 
whereas a font is the set of drawing that correspond to each code.  Every
Arabic encoding I am aware of use an encoding that maps one character to one
code, independently from the shape, and that's the case in particular the case 
for ISO-8859-6 and cp-1256.  The software is the one which has the
responsibility of doing the shaping.  To illustrate this, consider a
iso-8858-1 encoding in which A has the code 65.  When you store 'A', the byte
65 is stored.  At the time of viewing, it's not important whether you use a
Time font or Courrier font, as long as the those fonts are stored in a
iso-8859-1 compliant mapping, you will see an A because the code 65 will be
visualized by a drawing that reads A.  With contextual formatting in Arabic,
there's one more step. 3ayn for example is stored in 217 in iso-8859-6,
whatever is its position in a word.  But before directly mapping one code to
one drawing, a soft doing contextual formatting would analyse it knowing it's
a 3ayn thanks to its code and replace it with a correct shape (using an
internally coded font that contains every possible shape for example).

The font I was talking about in Akka is an ISO-8859-6 backward compatible font
(I say backward compatible because I have cheated and added Farsi letters in 
empty codes, at the same place MAC-Arabic does, which should actually be part
of the standard in order to unify all Arabic-based script encodings, and
because those letters are de facto part of Arabic - who says fideo instead of
video?) that can be processed Latin-way, i.e. one code corresponds to one
drawing.  Anyway, the font delivered with Akka has been drawn in a way that
makes a text that's not processed for contextual formatting by the software
that displays it perfectly readable and elegant nonetheless for anyone who
reads Arabic.  A kind of Arabic "square" scripture if you know some Hebrew
basics.  That alleviates people working on Arabization from one heavy task,
which is contextual formatting, as it is compatible with every text using the
same encoding so far and can leave it for more complex applications or those
who are only motivated for working on it, and bring Arabic processing one step
closer to simplicity. Actually, it even makes any Arabic text stored with an
LTR assumption readable by any existing English soft.  IOW, by eliminating
contextual formatting, and the bidi-style storage, you simply make Arabic as
simple as English on computers.

On encodings:
ISO-8859-6 is the best encoding IMO coz it respects the terminal control 
character codes which cp-1256 doesn't for example.  Mac-Arabic is a superset
of ISO-8859-6 and includes Farsi letters which is great but violates the
said control characters like CP-1256.

Copying and pasting an old post of mine:
On the need to extend the existing ISO 8-bit standard for Arabic encoding
(ISO-8859-06).

On could wonder about the needs that would push us to extend the standard,
when it's not widely used anyway, and when we have the choice between many
already existing character encodings?

The answer is simple: none of the existing encodings except the ISO standard
respects at the same time terminal control codes positions and the ASCII 
character set. On the other hand, the existing ISO standard is very
incomplete.  It doesn't represent the sounds that exist in spoken Arabic,
needed to transcript terms of popular art, folklore songs, stories and poetry
that do not already exist in Standard Arabic. Even in Standard Arabic, many
newly introduced (borrowed) technical terms are  transcripted differently
from the way they are pronounced (e.g. video).  Thus, ISO standard would at
least need the introduction of the de facto Arabic letters of ve, pe, and gaf.
In addition to this, surveys would also be needed to take into account the need
for French-specific latin characters, widely used in the Maghreb. In case it
obviously appears that the fact they don't exist in the ISO standard is a
handicap the way English characters do, they should be added.

Finally, though this is optional, it would be a great benefit if all Arabic 
based languages like Farsi or Urdu for example all use the same encoding since
encoding-dependent works on one language would benefit the others.

Later,
Chahine