As promised, attached is our proposal to the Unicode Technical Committee about Bidi and Shaping interaction, that we sent last night for consideration in the UTC meeting due to be started on March 4. Also attached is a discussion paper from Mark Davis, the president of Unicode Consortium and the author of Unicode Bidirectional Algorithm. Please send any comments to me, Behdad, and/or this mailing list. Also, please don't circulate the documents out of Arabeyes mailing lists. roozbeh
Attachment:
bidi5.pdf
Description: Adobe PDF document
L2/03-064Unicode BIDI Issue #52003-01-22, MED The key issue for the Unicode BIDI committee before the next UTC meeting is to discuss and come to consensus on item #5: whether (logically) shaping gets applied before or after BIDI directional reordering. In most cases, this doesn't matter, but it can affect the result. The following describes the possible differences in appearance, and outlines options for the committee to decide among. We will first set up a simple test case. Suppose that we have the following string of Arabic characters in memory, as characters 1, 2, 3, and 4.
We will override the first two characters to be LTR. So that we can show both paragraph directions, the next two will be embedded, but with the normal RTL direction. One can use embedding codes to get this effect in plain text, or markup in HTML. This is reproduced below, although the effect in the last three rows will depend on the browser's BIDI support of these characters and/or HTML styles.
The resulting display order will be one of the following, depending on the paragraph direction.
There are a number of possible shaping results, depending on what happens within runs and what happens across runs. The four most likely candidates are: A. If we shape, then apply BIDI, we get the following visual result:
B. If we shape simply according to the resulting display order (after BIDI), we get the following:
C. If we shape simply according to the resulting display order (after BIDI), but don't shape across direction-run boundaries, we get the following:
D. If we simply don't shape characters with overridden direction, we get the following:
I think the argument for the (A) is that in practice it will be quite unusual to override the direction of Arabic letters, and it may not matter than the forms look odd. And (A) may be simpler to implement, since line breaks can be decided before applying the BIDI algorithm. For (B) or (C), one could argue that the end result is less weird, and that in practice the BIDI algorithm must be applied anyway to the entire paragraph; so at that point one knows what the ordering is anyway. (C) may be simpler to implement, since one never needs to look outside of directional boundaries for shaping. (D) probably is no simpler to implement, since you still have to determine the runs before you decide whether or not to shape.
We could also, of course, have an approach Z: Z. The results of shaping directionally-overridden characters are undefined, and could be any of the above. The BIDI committee should discuss the ramifications of these approaches, hopefully developing a consensus before the next UTC meeting. |