[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode and Bidi vs Shaping



On Fri, 7 Mar 2003, Mohammed Elzubeir wrote:

> [...] I take this reference to mechanical formatting to refer to
> converting legacy text to Unicode compliant ones.

No, wrong assumption. It's talking about automatically generating texts, 
instead of converting them from something else. For example, consider that 
a program wants to display something in Arabic, and so it automatically 
puts RLE and PDF around that text to make sure it is displayed in proper 
Arabic order, with a base direction of Arabic. Then that Arabic text may 
contain some English phrase that may be inside an LRE..PDF pair itself.
Now consider that this whole gets cut and pasted somewhere in a bug 
report system in Arabic, with the embeddings automatically copied and a 
LRE..PDF put around the whole thing to make they are displayed the 
same....

You can see that almost all of these pairs are automatically generated to
some degree or other. This is what that autogeneration means.

> Have you run any tests to prove this?

That's Unicode's claims. We have run FriBidi on many random-generated or 
specifically-designed test cases to make sure it supports exactly those 61 
levels and exactly as specified in Bidi, no more, no less.

> Tried this on a corpus of text?

No. Those 61 levels rarely happen in corpi. See above.

> If I understand your paper correctly, you are saying that you should
> apply the bidi algorithm, short of removing the explicit embedding and
> overriding codes, shape, then continue. 

No, not simply continue, but do Arabic joining/shaping, and then continue 
the rest of Bidi.

> Please do keep us posted on what the UTC comes up with.

Sure. But that may take some time, as the UTC members keep some silence
for a little while after the meetings for a little rest (being tired for
the pre-meeting document rush, the meeting, and the travel, which may be
across the whole US or even to Canada or this time even to India for a
member).

That'll be the thing that we are going to implement in FriBidi also.

roozbeh

PS: what's happening with Arabeyes becoming a member of Unicode? I prefer
not to act as a liaison much more. It takes time, and it ruins
possiblities to educate an Arab Unicode guru in the meanwhile (which we
miss).