[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Fribidi and joining
- To: developer at arabeyes dot org
- Subject: Re: Fribidi and joining
- From: Mohammed Elzubeir <elzubeir at arabeyes dot org>
- Date: Thu, 22 Aug 2002 18:56:46 -0500
- User-agent: Mutt/1.3.28i
On Thu, Aug 22, 2002 at 10:32:10PM +0430, Roozbeh Pournader wrote:
> > (how common are those cases btw ?).
> These cases are not that common, but the point is that you need to clarify
> them to get somewhere. We can't afford the price of implementing something
> that's brain-damaged, and find that it's bad after we implemented it. We
> also can't afford saving any Arabic file in GNOME's gedit and opening it
> with KDE's kate (or yudit, or vim, or MS Notepad) to find something else
> displayed. We need a very clear specification in Unicode.
This is already the case (non-uniform displayed text).
> > Its been noted that it might look as though fribidi will have to do some
> > initial Bidi'ing, then shape then complete the Bidi.
> Well, that's what the bidi algorithm already says (althoug ambiguously).
> Our point is that it should be a little more complex. The Right Way should
> be something like doing some shaping then doing some bidi, then finalizing
> shaping, then breaking lines of a paragraph, and finally do some more
> bidi. But even that has some details that we don't know what to do with.
> > What are those scenarios and could we see a example or two of 'em ?
> Well, a short introduction is: <LRO, Meem, Noon, PDF>. If you do shaping
> first, and bidi then, it becomes (visually):
> Initial-Meem, Final-Noon
> but vice-versa, it becomes:
> Final-Meem, Initial-Noon
> A basic question will be: which should be done first? An easy answer
> (based on the above result) is: second of course! But let's get to another
> example: <Heh, ZWJ>, which is a clean way to get an initial Heh (which you
> need frequently). If you do bidi first, instead of initial Heh, you will
> have a Final Heh!
> Put these in a can with line breaking (which need to be aware of the width
> of the letters to find the appropriate line breaks), and you'll be in a
> complete confusion.
Okay, so why aren't we simply ignoring control characters, bidi formatting
codes, etc when doing the shaping (post-bidi)?
> > and/or IBM's ICU ?
> That I have not tested yet. But almost all its authors are Unicode
> insiders, and since they did not have a clue about the case, it also just
> does some random thing.
Looking through the ICU pages I can see they have _something_ about shaping,
but it's not clear to me whether that is tied to ubidi at all.
> > + While working on a solution (don't know how much of an overhaul will
> > be required of the spec), can reversibility be sneaked in ?
> No. Reversiblity was just a nightmare of Gaspar Sinai when he found about
> that bidirectional scripts are really hard to do. You know what
> reversiblity means? It means that for each Arabic visual string there
> should only be one Arabic logical string. That means neither of LRM, RLM,
> LRO, RLO, PDF should be allowed (which are really needed, believe me). It
> may also mean that bidi should be simplified considerably.
The ICU library seems to have a reverse RTL function:
U_CAPI int32_t U_EXPORT2 ubidi_writeReverse ( const UChar * src,
UChar * dest,
UErrorCode * pErrorCode )
"This function preserves the integrity of characters with multiple code units
and (optionally) modifier letters. Characters can be replaced by mirror-image
characters in the destination buffer. Note that "real" mirroring has to be done
in a rendering engine by glyph selection and that for many "mirrored"
characters there are no Unicode characters as mirror-image equivalents. There
are also options to insert or remove BiDi control characters."
| Mohammed Elzubeir | Visit us at: |
| | http://www.arabeyes.org/ |
| Arabeyes Project | Homepage: |
| Unix the 'right' way | http://fakkir.net/~elzubeir/|
Was I helpful? Let others know: