[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fribidi and joining

To: developer at arabeyes dot org
Subject: Re: Fribidi and joining
From: Mohammed Elzubeir <elzubeir at arabeyes dot org>
Date: Thu, 22 Aug 2002 18:56:46 -0500
User-agent: Mutt/1.3.28i

On Thu, Aug 22, 2002 at 10:32:10PM +0430, Roozbeh Pournader wrote:
> 
> > (how common are those cases btw ?).
> 
> These cases are not that common, but the point is that you need to clarify
> them to get somewhere. We can't afford the price of implementing something
> that's brain-damaged, and find that it's bad after we implemented it. We
> also can't afford saving any Arabic file in GNOME's gedit and opening it
> with KDE's kate (or yudit, or vim, or MS Notepad) to find something else
> displayed. We need a very clear specification in Unicode.

This is already the case (non-uniform displayed text). 

> > Its been noted that it might look as though fribidi will have to do some
> > initial Bidi'ing, then shape then complete the Bidi.
> 
> Well, that's what the bidi algorithm already says (althoug ambiguously).  
> Our point is that it should be a little more complex. The Right Way should
> be something like doing some shaping then doing some bidi, then finalizing
> shaping, then breaking lines of a paragraph, and finally do some more
> bidi. But even that has some details that we don't know what to do with.

> 
> > What are those scenarios and could we see a example or two of 'em ?
> 
> Well, a short introduction is: <LRO, Meem, Noon, PDF>. If you do shaping
> first, and bidi then, it becomes (visually):
> 
> 	Initial-Meem, Final-Noon
> 
> but vice-versa, it becomes:
> 
> 	Final-Meem, Initial-Noon
> 
> A basic question will be: which should be done first? An easy answer
> (based on the above result) is: second of course! But let's get to another
> example: <Heh, ZWJ>, which is a clean way to get an initial Heh (which you 
> need frequently). If you do bidi first, instead of initial Heh, you will 
> have a Final Heh!
> 
> Put these in a can with line breaking (which need to be aware of the width 
> of the letters to find the appropriate line breaks), and you'll be in a 
> complete confusion.

Okay, so why aren't we simply ignoring control characters, bidi formatting
codes, etc when doing the shaping (post-bidi)?

> > and/or IBM's ICU ?
> 
> That I have not tested yet. But almost all its authors are Unicode
> insiders, and since they did not have a clue about the case, it also just
> does some random thing.

Looking through the ICU pages I can see they have _something_ about shaping,
but it's not clear to me whether that is tied to ubidi at all.

> >  + While working on a solution (don't know how much of an overhaul will
> >    be required of the spec), can reversibility be sneaked in ?
> 
> No. Reversiblity was just a nightmare of Gaspar Sinai when he found about
> that bidirectional scripts are really hard to do. You know what
> reversiblity means? It means that for each Arabic visual string there
> should only be one Arabic logical string. That means neither of LRM, RLM,
> LRO, RLO, PDF should be allowed (which are really needed, believe me). It
> may also mean that bidi should be simplified considerably.

The ICU library seems to have a reverse RTL function:

U_CAPI int32_t U_EXPORT2 ubidi_writeReverse ( const UChar * src,
                                              int32_t srcLength, 
                                              UChar *  dest,
                                              int32_t destSize,
                                              uint16_t options,  
                                              UErrorCode * pErrorCode )   

"This function preserves the integrity of characters with multiple code units
and (optionally) modifier letters. Characters can be replaced by mirror-image
characters in the destination buffer. Note that "real" mirroring has to be done
in a rendering engine by glyph selection and that for many "mirrored"
characters there are no Unicode characters as mirror-image equivalents. There
are also options to insert or remove BiDi control characters."

[Reference: http://oss.software.ibm.com/icu/apiref/ubidi_8h.html#a39]

later
-- 
-------------------------------------------------------
| Mohammed Elzubeir    | Visit us at:                 |
|                      |  http://www.arabeyes.org/    |
| Arabeyes Project     | Homepage:                    |
| Unix the 'right' way |  http://fakkir.net/~elzubeir/|
-------------------------------------------------------
---
Was I helpful? Let others know:
http://svcs.affero.net/rm.php?r=elzubeir

Follow-Ups:
- Re: Fribidi and joining
  - From: Roozbeh Pournader
- Re: Fribidi and joining
  - From: Behdad Esfahbod

References:
- Re: Fribidi and joining
  - From: Nadim Shaikli
- Re: Fribidi and joining
  - From: Roozbeh Pournader

Prev by Date: Re: Akka merge
Next by Date: Re: Pango accelerator patch
Previous by thread: Re: Fribidi and joining
Next by thread: Re: Fribidi and joining
Index(es):
- Date
- Thread