[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mined/mlterm ligature joining

--- Thomas Wolff <mined towo net> wrote:
> Hello Nadim,
> I am planning to release the next version of mined soon and I'd 
> like to assure that the ligature problems will be solved then.

Sorry for the delay - been tangled up in things.

> I'd appreciate your comments on these two issues we had. The 
> second is a general question about terminal handling of Arabic 
> characters, not specific to mined.
> 1.
> You wrote:
> > enter U+0645 (MEEM) then U+064E (FATHA) - the cursor now should stand
> > right after the U+0645 (the U+064E is a "composing" character and gets
> > folded into whatever proceeded it), now enter U+0646 (NOON) -- see the
> > extra space ?
> I wrote:
> > No, this looks perfect here, no extra space. It's on SuSE Linux 8.1
> Can you confirm that this text is handled well now?

I still see it on solaris.  Could someone else try this and report
back with findings ?  Download mined-2000.6 from,


compile and start with ``mined -UpoX arabic_utf8_file''.  You should
now see arabic without a problem, now switch keyboard (Alt-K - note
capital k) and then select "Arabic" via arrow keys.  Now enter arabic
and try out combining characters - is there an extra space ?

FWIW: mined-2000.6 behaves better.  The extra space (denoted by
a paragraph symbol) is still there behind (to the left of)
joining/combining characters (LAM+ALEF), but its gone once one
presses on the RETURN key.  If you simply move the cursor (up/down,
left/right) the space stays on - I wish I had the time to dive
into the code, but I'm swamped.  Looks like you recalculate the
line upon a carriage return instead of after the character that
follows combining character.  Thomas, I can send you a screenshot
if you like.  NOTE again: this only happens when you enter characters,
if you open a stored file, everything is displayed properly.

> 2.
> mlterm applies automatic ligature joining to the LAM/ALEF combinations 
> you mentioned. These are only 4 actual letter pairs, in isolated and 
> final form each, so resulting 8 character combinations that need 
> special handling.
> I could confirm with a test file that I generated from Unicode data 
> that exactly these 8 are joined into the ligature form by mlterm.

Yeah.  BTW: this combining is so essential to Arabic that a terminal
emulator (like mlterm) should and must do it - I think you were
wondering about that previously.

> There are, however, a number of other Unicode characters called 
> ARABIC LIGATURE, listed with according base character pairs (or more 
> than 2 base characters in some cases). The ligature glyphs are not 
> contained in the font I have installed, so I'm not sure if mlterm 
> would join them if the glyph was present.
> Please tell me about the supposed behaviour. Are all these to be 
> automatically joined if a terminal supports ligature joining?
> I append an excerpt from Unicode data about the characters in 
> question just to make sure it is clear what I am speaking about.

I think you're looking at unicode's Presentation Form-A and you
shouldn't (www.unicode.org/charts).  Although its noted as
"ARABIC Presentation Forms A", that code chart is NOT needed.

The only code charts of concern with regard to Arabic are,

 1. Arabic			-- U+0600 - U+06FF
 2. Arabic Presentation Forms-B	-- U+FE70 - U+FEFF

Hope that helps.

 - Nadim

Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.