[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: less & composing characters - fix
- To: Mark Nudelman <markn at greenwoodsoftware dot com>
- Subject: Re: less & composing characters - fix
- From: Nadim Shaikli <shaikli at yahoo dot com>
- Date: Tue, 24 Feb 2004 16:55:25 -0800 (PST)
- Cc: developer at arabeyes dot org
--- Mark Nudelman <markn at greenwoodsoftware dot com> wrote:
> > One minor annoyance. When I search for LAA via '/'
> > (LAA is a combined character made up of LAM + AlEF), the
> > LAM disappears on me (as it should), but instead of seeing
> > the LAA glyph, I see the ALEF.
>
> Hm, this isn't quite what I would have expected. Less actually has a
> rather strange way of outputting characters in the command line. When
> you type a character X, what actually gets output is "ESC[K X \b X",
> where "ESC[K" is the line-clear escape sequence for your terminal. So
> in your LAM+ALEF case, what's getting output is
> ESC[K 0xD9 0x84 \b 0xD9 0x84
> ESC[K 0xD8 0xA7 \b 0xD8 0xA7
> (D9,84 is of course the UTF-8 encoding of 0x644 and D8,A7 is the
> encoding of 627.) I don't know if it's reasonable to expect the
> terminal to do the right thing (ie. replace the LAM with LAA) given
> that sequence. But you say it does work for composing characters,
> which have the same kind of sequence, so I'm not sure what's going on.
Yup, its very reasonable to expect the terminal to do the right thing.
This might be something more related to mlterm and how it handles
combining (vs. composing) and escape sequences. I'll check with
mlterm on this. Mind you the search works great, it just looks
odd'ish when you enter the characters.
> So I'm guessing it is an
> > encoding issue or similar. In terms of encodings,
> >
> > ALEF - 0x0627 -> 0xfe8d or 0xfe8e
> > LAM - 0x0644 -> 0xfedd or 0xfede or 0xfedf or 0xfee0
> > LAA - 0xfefb or 0xfefc
>
> I don't quite understand this -- what are the hex values to the
> right of the arrow?
Well those are Form-B encodings vs. the ISO-8859-6 ones prior
to the arrow (or ignore the arrow all together; those values
post the arrow are shaped character encodings).
> > BTW: I did see instances where mlterm should not show 2-3
> > lines in a file esp. when doing a search and the only
> > way to illuminate the highlighted text is to scroll-up
> > say 10-12 lines and CTRL-L (redraw).
>
> Could you give me an example of this?
Sure, I'll send it to you in private so as not to spam people with
silly test-cases (anyone looking to repeat this, just keep searching
for various words till you see something strange ;-) but you'll have
to run it on mlterm to see what I mean.
Regards,
- Nadim
__________________________________
Do you Yahoo!?
Yahoo! Mail SpamGuard - Read only the mail you want.
http://antispam.yahoo.com/tools