[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[marco.cimarosti@essetre.it: RE: Why Arabic Shaping?]



One minor comment added under ISO-8859-6 fonts.

----- Forwarded message from Marco Cimarosti <marco dot cimarosti at essetre dot it> -----

From: Marco Cimarosti <marco dot cimarosti at essetre dot it>
To: 'David Starner' <dstarner98 at aasaa dot ofe dot org>, unicode at unicode dot org
Subject: RE: Why Arabic Shaping?
Date: Mon, 27 Aug 2001 14:46:28 +0200
Lines: 148

David Starner wrote:
> I got a reply on the "Why Arabic Shaping" thread.
> Since unicode at unicode dot org bounces mail from non-subscribers,
> I'm sending this for him. I didn't mention last time that
> it's a list for the Arabization of _Unix_, so the questions
> are slanted in that direction.
> 
> ----- Forwarded message from Nadim Shaikli -----

I don't have Nadim's e-mail; would you forward this too, or ask him to join
unicode at unicode dot org?

>  David Starner wrote:
> >
> > From: "Nadim Shaikli"
> > >  1. All Arabic fonts _must_ include forms-B (or equivallent) for
> > >     them to be properly called Arabic fonts (ie. ISO8859-6) since
> > >     without those
> > 
> > No can do. An ISO8859-6 font has a fixed set of 190 glyphs, 94 ASCII
> > and 96 Arabic. It can't include Forms-B. If you want Forms-B, you're
> > going to have to use a Unicode font, not an ISO8859-6 font.

No. ISO-8859-6 is based on the same concept as the Unicode Arabic block (or,
rather, Unicode Arabic is based on ISO-8859-6), so an ISO-8859-6 font should
have all the necessary glyphs. Like for Unicode, ISO-8859-6 code points
represent abstract characters, not glyphs.

Now, this leaves the problem open about *how* ISO-8859-6 or Unicode fonts
may store more glyphs than encoded characters, and how an application is
going to use these extra glyphs.

[David's comment: Speaking in Unix X font terms, a *-iso8859-6 font has at
most 256 glyphs, numbered 0-255, that are generally expected to correspond
to the code points of ISO-8859-6. A font standard that would treat an 
*-iso8859-6 font differently could be accepted, but is probably pointless;
what do you gain over a full *-iso10646-1 font with the same capabilies?]

> Well that simply means that ISO8859-6 stand-alone is useless.

Same for Unicode. The scope of both standards is just to encode the text,
not to render it.

> OK - so we agree on the point that the various glyphs have to be
> included for a font to be usable; now the question is how does one
> accomplish this (sans OpenType/FreeType).

Right. It currently exists *no* standard for displaying logical encodings
such as Unicode or ISO-8859-6.

Technologies like OpenType, ATSUI, or Graphite are *not* standards, although
one of them could eventually become so widespread that it will be called a
"de facto standard".

Many people think that there is no need for such a standard, because there
is an high degree of typographic variation in displaying some scripts.
(E.g., Arabic display can be as simple as selecting 2 glyphs per character,
and as complicated as having one glyph for each word in the dictionary.)

Personally, I think that it could be feasible and useful to have a standard
repertoire of glyphs and a standard set of rules for a *minimal* readability
of all Unicode "complex scripts".

I also think that there should be a sort of "intermediate Unicode" to be
used internally by rendering engines and editors. I imagine an encoding
half-way between the purely logical external encoding and a glyph encoding.
This intermediate code should be generated on-the-fly just before entering
the rendering process proper. It could help standardizing display and
editing processes, still leaving the rendering engine quite free to chose
its graphical style.

However, many people do not agree with this kind of opinions, and they have
a good part of reason.

The main risk of standardizing a "minimal rendering" is that all digital
applications may limit themselves to that simplified form, thus blocking all
the investments and research for reaching typographical excellence. That
would have the disastrous effect that the digital era will kill all sorts of
beautiful things such as the exquisite Pakistani typography.

My view, however, is that if the technology takes too much time or money to
accommodate some scripts, the effect could be that those script go out of
existence altogether. Or, even worse, the languages themselves that use
those scripts can be endangered for being unusable on computers.

So, I'd rather go for a simplified and *standard* display, considering two
things:

1) In the west, two centuries of typewriters and monospaced fonts have not
killed fine typography. Maybe next decades' Pakistani-on-the-road will use
an ugly and simplified font for their e-mails, but professional publisher
will still keep the market alive for finer typographical solutions.

2) Anyway, technology has its effect on the appearance of writing. This
always happened and will still happen, regardless of our nostalgia for
expired graphicalities. See China: the introduction of soft brush and liquid
ink has killed all curved strokes in hanzi. See Europe the introduction of
lead type has killed all the ligatures that were used in handwriting.
However, Chinese and European scripts have deployed these technological
limitations to come up with new kinds of graphical beauty.

> Without getting into specifics of OpenType and how it functions
> (my reading indicates that it will require a font input file anyways),
> let's agree that OpenType is not a solution I can go and download
> __today__ for my Arabization development effort -- as such, I
> continue to look for a means to generate these fonts/tables/glyphs
> and I'm trying to understand the "standard" way of how these things
> are supposed to fit together (specifically the tables).

As I said, there is *no* standard way to fit these things together.

The nearest thing to a standard minimal set of Arabic glyphs is Unicode
Arabic extensions A and B. The rules to map the logical characters to glyphs
is explained in the Unicode book and summarized in:

	http://www.unicode.org/Public/UNIDATA/ArabicShaping.txt

> > - Searching and comparison are easier because if you want to search
> > or compare the letters XYZ, that way you only have to look for the
> > letters, possibly sans vowelization symbols. If one uses 
> presentation
> > forms for encoding, you have to search/compare X + YZ, XY + 
> Z, X + Y +
> > Z and XYZ separately which is a real pain, extremely complicated to
> > implement and prone to errors of all sorts.
> 
> Its not that complicated if you ignore ligatures (I've noted a simple
> algorithm on arabeyes' mailing-list in case anyone wants a follow-up).

It is more complicated than necessary, however.

Keep in mind that display operation are relatively infrequent, so it makes
sense to pay an overhead for searching glyphs inside a font.

On the other hand, text compare operations occur all the time, e.g. hidden
inside database engines, and they need to be extraordinarily quick.

> > Similarly: disunifying Arabic shape variants makes search and sort
> > slightly more complicated; unifying them makes search and sort
> > simpler but  complicates the display process, and requires the
> > introduction of a "zero width joiner" and "zero width non joiner"
> > controls.
> 
> Thank you Marco -- that's exactly what I was getting at.  Not for
> the sake of argument, but to understand and progress on a foundational
> element.

Remind that I was still acting as the devil's advocate. I have said above
that search and sort should be given an higher priority than display. The
human eye is slow, so it can afford to wait wile computer is redrawing the
screen.

Ciao.
_ Marco

----- End forwarded message -----

-- 
David Starner - dstarner98 at aasaa dot ofe dot org
Pointless website: http://dvdeug.dhis.org
"I don't care if Bill personally has my name and reads my email and 
laughs at me. In fact, I'd be rather honored." - Joseph_Greg