[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fribidi and joining



Well, your message is a little hard to reply to, and it was very
well-writeen, a way to force me provide more details. ;)

Everybody: Please read these carefully if you're interested in taking this
ship somewhere. Neither me nor Behdad knows when we're going to leave Open
Source i18n and do something else, but that may be soon. People need to
lead the ship somewhere after that. Don't count on Europeans or American
doing this for you if you want consistency.

On Wed, 21 Aug 2002, Nadim Shaikli wrote:

> To put more beef to this discussion, fribidi after much wrangling and
> convincing is slated to add shaping/joining support to their library but
> what has happened is that Behdad/Roozbeh have found instances in which
> its not quite that simple to do 

Well, it is not quite *clear* to do. It's easy as far as someone exactly 
specifies what should be done, but that specification is *very* hard.

> (how common are those cases btw ?).

These cases are not that common, but the point is that you need to clarify
them to get somewhere. We can't afford the price of implementing something
that's brain-damaged, and find that it's bad after we implemented it. We
also can't afford saving any Arabic file in GNOME's gedit and opening it
with KDE's kate (or yudit, or vim, or MS Notepad) to find something else
displayed. We need a very clear specification in Unicode.

> Its been noted that it might look as though fribidi will have to do some
> initial Bidi'ing, then shape then complete the Bidi.

Well, that's what the bidi algorithm already says (althoug ambiguously).  
Our point is that it should be a little more complex. The Right Way should
be something like doing some shaping then doing some bidi, then finalizing
shaping, then breaking lines of a paragraph, and finally do some more
bidi. But even that has some details that we don't know what to do with.

> What are those scenarios and could we see a example or two of 'em ?

Well, a short introduction is: <LRO, Meem, Noon, PDF>. If you do shaping
first, and bidi then, it becomes (visually):

	Initial-Meem, Final-Noon

but vice-versa, it becomes:

	Final-Meem, Initial-Noon

A basic question will be: which should be done first? An easy answer
(based on the above result) is: second of course! But let's get to another
example: <Heh, ZWJ>, which is a clean way to get an initial Heh (which you 
need frequently). If you do bidi first, instead of initial Heh, you will 
have a Final Heh!

Put these in a can with line breaking (which need to be aware of the width 
of the letters to find the appropriate line breaks), and you'll be in a 
complete confusion.

> (this question was posed, but it seems as though unicode bidi gurus
> were stumped and that its rather complicated and thus the need and
> requirement, as noted above, to re-read all the specs and get __very__
> familiar with the requirements).

Well, they raise many points such as backward compatiblity and also
allowing applications to have a bidi black box that given each string,
gives out the same string that the reference bidi implementation is giving
out. Adding these will need surgery in many applications who are doing
bidi optimally (for time or space), and Unicode people want to minimize
the surgeries needed. And since bidi implementations are done differently,
our opinion may be different from theirs. So we need to specify every
choice clearly so they can decide.

>  + Microsoft has solved this problem somehow in their own proprietary way
>    (typical)

Well, not really their proprietary way, but what they had thought is the
most logical thing. We may end up doing exactly what they're doing (but I
prefer not investigating that first, to keep our heads clean).

> but had Pango

Stops joining Arabic letters if they are in a in a left-to-right override
(like the first example above). It is like remaining silent until one's
lawyer comes!

> QT 

Well, it just does some random thing. Qt is still far from being Unicode
compliant. Qt is usually bug-driven or feature-request-driven. So it will
do what it is currently doing (based on Lars Knoll's original work) unless
someone asks otherwise.

> and/or IBM's ICU ?

That I have not tested yet. But almost all its authors are Unicode
insiders, and since they did not have a clue about the case, it also just
does some random thing.

>    you guys run your test-cases on those libraries?

No, but we've read the sources ;)

>   Are their authors aware of these issues 

Owen (Pango) should know that, since he's the one that has turned shaping 
off in LTR overrides. Lars (Qt) knows that, and even knows that there may 
be bidi incompliances in his code. ICU authors are usually Unicode 

> (Pango used fribidi, don't know if that's still the case).

It uses FriBidi code, but not FriBidi itself. Owen takes some code from 
FriBidi regulary and puts it in a module named mini-fribidi in Pango.

>  + While working on a solution (don't know how much of an overhaul will
>    be required of the spec), can reversibility be sneaked in ?

No. Reversiblity was just a nightmare of Gaspar Sinai when he found about
that bidirectional scripts are really hard to do. You know what
reversiblity means? It means that for each Arabic visual string there
should only be one Arabic logical string. That means neither of LRM, RLM,
LRO, RLO, PDF should be allowed (which are really needed, believe me). It
may also mean that bidi should be simplified considerably.

> In terms of getting feedback on the new Fribidi API, I think the fribidi
> list would certainly have a more informed, immediate and critical view of
> life since the internals of fribidi are a mystery to many here, I'd guess.

Well, we already sent that there. You told me to send it here also :)

The exact situation is: We don't only need critical commenting on the API,
but contructive commenting also. That API design is incomplete, and we
need someone to join us in designing it.

> If memory serves, none of the 'fribidi-discuss' subscribers has issues
> with Behdad's initial API which leads me to believe that its most likely
> OK given whatever problem you guys have recently discovered is taken care
> of.

The exact situation: Since bidi interaction with shaping will not be
finalized until November at least, Fribidi 1.0 API should be done before
we know about their decision. Thus it should be designed without shaping,
but with some considerations about it. And having both in our mind is
hard, so it'll be good if people here help us with the new design, so we
can then clear our mind and switch full throttle to getting bidi vs
shaping problem cleared, then we can work on implementing shaping in
FriBidi. That's the shortest and cleanest route, IMHO.

> Salam.

Well, in Persian Salam is used only in beginning, so that's a little 
ironic... ;)

roozbeh