[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode and Bidi vs Shaping



On Fri, 7 Mar 2003, Mohammed Elzubeir wrote:

> 1. When you say: [...]

We're not saying that, Mark Davis is ;)

> Codes
> ----------------
> LRM/RLM
> LRO JEEM AIN PDF
> RLO LAM MEEM PDF
>
> The code LRM (Left-Right Mark, which is a weak zero-width character) simply 
> says this text is left-to-right.

...or right-to-left for the RLM case (after the slash)...

> So after the mark we have characters which 
> would normally act as those marks (but in the case of Arabic letters they 
> would act as RLM codes). Then you have LRO (explicit Left-to-right override)
> forcing JEEM and AIN to be treated as left-to-right. Again you have LRO
> but for LAM and MEEM.. these are embedded in the LRM?

The RLM or LRM sets the paragraph direction to left-to-right or 
right-to-left, since the overrides can't do that. This is to make sure we 
have the example in both paragraph directions to see possible differences.

In other words, No, those are not embedded in LRM/RLM. LRM and RLM don't
have any embedding behavior. They act like invisible letters, and are
there just there to set the direction of the paragraph.

> 2. I have tried to view the examples under Mozilla and under IE (both look
>    very different). I am referring to the first table's last three rows.

That first table is exactly there to let you see the difference in your
browsers, and see what they already do for the case.

> 3. I don't know if it's just me, but to my eyes, it looks as if example B is
>    the most acceptable (and "less weird" than C).

But me and Behdad are saying that all are problematic in some way or
other. You only have a different opinion from some others about "which is
less bad"! Your different opinion from Mark Davis may also be because you
are trying to see that weird thing as a single word, which is not a single 
word.

> 4. Perhaps #3 is because the given example is not a real word (at least not
>    an Arabic word I know) and so I found it difficult to relate to the
>    to a hypothetical one. This would emphasise the situation, wouldn't
>    you agree?

It can't be a real word, because nobody will put LRO and RLO in the middle
of a word. That's just a weird example to raise a question that if a
rendering engine saw that character sequence what should it do. The
situation rarely happens, the most frequent case is when a visual
Arabic/Hebrew codepage is converted to Unicode without doing a
reverse-bidi algorithm (that may lose some of the semantics of the
original). And then when ZWJ and ZWNJ are mixed in.

Anyway, which way is to be followed, may have already been decided by the
Unicode Technical Commitee. Today has been the last day of their meeting.

roozbeh