[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mined/mlterm display problems



Hello Nadim,

> enter U+0645 (MEEM) then U+064E (FATHA) - the cursor now should stand
> right after the U+0645 (the U+064E is a "composing" character and gets
> folded into whatever proceeded it), now enter U+0646 (NOON) -- see the
> extra space ?
No, this looks perfect here, no extra space. It's on SuSE Linux 8.1
aمَن

----------------------

The LAM/ALEF problem you pointed out is not related to 
combining/combined characters which are handled well by mined.
It's rather caused by the fact that mlterm applies LIGATURE 
substitution.
I'm actually not sure if a terminal should be supposed to do that, 
but maybe it's of such an essential nature to Arabic typesetting 
that it's useful this way.

> enter U+0644 (LAM) followed by U+0627 (ALEF) followed by U+0631 (REH) -
The display problem already occurs after entering the ALEF; 
the display is different after save and reload but it's wrong then, too 
(note that you can move the cursor beyond the end-of-line mark.)
Actually, this is a big problem in general as an application cannot know 
if the terminal has this feature or not. There will have to be an 
additional explicit parameter to tell it whether to assume this behaviour.

> Its the same flavor problem as was noted.  Yes, mlterm does the visual
> combining, what mined needs to do is to track the previous character
> as well as current character and not account for an extra cursor/column
> increment iff certain combinations are present.  In pseudo code,

>   if ( /* have combined -
>           check various combining flavors,
>            U+0644 + U+0622
>            U+0644 + U+0623
>            U+0644 + U+0625
>            U+0644 + U+0627
>         */
>       ((prev_char == LAM) && (current_char == ALEF)) ||
>       ...
>        /* have composing -
>           check various composing flavors (all tanween),
>           U+064B -> U+0655
>         */
>       (current_char == FATHA) ||
>       ...
>      )
>   {
> 	/* skip cursor/column increment */
>   }
>   else
>   {
>         /* do normal cursor increment */
>   }
I have two remarks on this:
1. Apparently, the first part cares about ligatures. Only 4 actual 
   combinations are listed and I could verify with a test file 
   (generated from Unicode data) that exactly these display wrong with 
   mined on mlterm.
   But the Unicode data list a lot of further ARABIC LIGATURE 
   characters between U+FBEA and U+FDFB, there is just no glyph in the 
   font that my mlterm installation uses. If the font is extended, 
   would mlterm apply ligature substitution here as well? Or is this 
   planned for the future? This should be clarified in order to assure 
   reliable behaviour.
2. The second part is about combining characters. As noted above, 
   I see no problem with mined. But there is a bug with mlterm 
   (on SuSE Linux 8.1) where no combining characters are displayed at 
   all, just the base characters are shown.
   My test file xar contains the characters:
	abcييdefييghi
	abcي̊ي̀defيۖيۗghi
	(abc U+064A U+030A U+064A U+0300 def U+064A U+06D6 U+064A U+06D7 ghi,
	the first line without the accents)
   Both lines look identical when I type "cat xar" on mlterm.

I have made a first quick and dirty patch for the LAM/ALEF ligatures.
I've uploaded it to http://towo.net/mined/mined-2000.6.tar.gz in case 
you'd like to try it (it's not yet linked from the download page).

Kind regards,
Thomas