[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Arabic Unicode fonts



David Starner wrote:

> > OK, here are a couple more questions :-)
> >
> > Why do it this way :-D ?  Are there some hidden advantage that I'm not
> > thinking of (beside saving font space) ?
>
> Because Unicode is a character standard, instead a glyph standard. For a
> system that lets you use any Unicode script, you're going to have to do much
> more complex shaping for the Indic scripts, so supporting Arabic shaping
> shouldn't be a problem. It makes it possible to search for part of a word,
> without getting the forms all right. It corresponds closer to what's on the
> keyboard, doesn't it?

That and and some other facts, like the fact that at input time you can't
decide what is the next character, so it makes things easier or that an 8-bit
encoding wouldn't be enough to keep all shapes.

> > Why not preserve all these conversions so
> > that if someone wanted to read my 15MB :-) file they wouldn't have to wait
> > for any more conversions to take place (its a waste of time and processor
> > throughput) ?
>
> Is time and processor throughput really much of an issue? I'll see how fast
> Roman Czyborra's Perl script to turn the characters in 0600-06FF into the
> forms in Forms-A & B, but I strongly suspect the time will be negligible
> compared to everything else.

100% Right. + the fact that a non shaped glyph can be very readable anyway.

> > It just seems odd to go this way - its certainly cleaner to include all
> the
> > characters and their various permutations and give the user the ability to
> > decide what he wants to type and how he wants it to look;
>
> Zero width spaces, zero width non-joiner and zero width joiner characters
> should let you decide how you want it to look; it's going to take some work
> to either get everyone familar with how they work (I believe Roozbeh said
> that ZWJ and ZWNJ are standard on Persian keyboards) or get a nice UI to
> hide the ugly details.

Except that zero width spaces are not usable on a terminal. That said, the
advantages of storing fonts independently from their shapes outweight the
inconveniencies.

> > If this were to happen, it would give any application, given the
> > right set of fonts, the ability to display Arabic characters, no ?

nope, you can already do that with a position-independend glyph;)

>  The
> > person would be able to display (or read) a document, but wouldn't be able
> > to modify it unless he had Bidi support and shaping.

Actually, one should be able to modify even without bidi support (shaping is
never needed as mentioned above). It just makes your life harder, and a lot
harder if you make it according to that last arabeyesation "bidi proposal".

> If you can't support something as simple as Arabic shaping, then there's a
> lot of stuff in the Unicode standard you aren't supporting. They probably
> found it more important to encode Arabic "right" than to try and make it as
> simple as possible.

Note that shaping support doesn't require to follow any standard shaping since
it's very much an application internal. Acon for example (and akka, which has
btw now a complete glyph shaping support code, not yet uploaded to CVS) stores
all shapes in a proprietary encoded font it uses for glyph rendering. For all
other operations normal encodings are used.