[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: less & composing characters - fix

--- Mark Nudelman <markn at greenwoodsoftware dot com> wrote:
> I'm sure you've forgotten about this by now, but I've just been catching
> up on some old email about less, and I've been looking at your patch.

Nope, haven't forgotten just yet :-)

> I have a couple of questions:
> 1. My main question is, what would you suggest is the easiest way to
> test this?  I have an old Linux 7 system -- I don't know if it's easy to
> set up a UTF-8 environment on it.  Do you have any suggestions?

You don't really need a UTF-8 environment per se.  To make things really
simple though, you should download and install a couple of applications.

 + Fribidi (fribidi.sf.net)
 + mlterm  (mlterm.sf.net)
 + A complete Arabic font,
 + Read the part that notes mlterm,

You can always test out and look at this sample file,


Simply 'cat' an Arabic UTF-8 and you should see all well.  You could also
use 'xterm -u' but you won't see any shaping or bidirectionality (bidi)

> 2. I received a table of combing characters from someone else, which is
> slightly different than the table in your patch.  Comparing the tables
> against the Unicode database, it seems the other table does match the
> database better than yours.  The other table is at the end of this
> message.  Is there any reason I shouldn't use it?

I didn't check the tables, but if you have more confidence in the one you
are holding, by all means use it - we'll correct problems/bugs later if
need be.

> 3. I've gotten some complaints lately that multibyte characters entered
> in a search string don't display correctly, because of the way the
> cmdbuf code in less works.  Have you looked at this area at all?

Nope, I didn't.

Do please post to the 'developer' list (CC'ed above) as all developers
hang there (if you are not subscribed to the list, your post will be
accepted by the moderator in a day or so).

BTW: the code I had mailed ya is on Arabeyes' CVS (just in case),


Let us know if you need testers and/or more info.  I for one, I'm glad
to see this work finally take place.

 - Nadim

> Thanks,
> --Mark
> combiningStruct combineTable[] = {
> {0x300,0x357}, {0x35d,0x36f}, {0x483,0x486}, {0x488,0x489},
> {0x591,0x5a1}, {0x5a3,0x5b9}, {0x5bb,0x5bd}, {0x5bf,0x5bf},
> {0x5c1,0x5c2}, {0x5c4,0x5c4}, {0x610,0x615}, {0x64b,0x658},
> {0x670,0x670}, {0x6d6,0x6dc}, {0x6de,0x6e4}, {0x6e7,0x6e8},
> {0x6ea,0x6ed}, {0x711,0x711}, {0x730,0x74a}, {0x7a6,0x7b0},
> {0x901,0x902}, {0x93c,0x93c}, {0x941,0x948}, {0x94d,0x94d},
> {0x951,0x954}, {0x962,0x963}, {0x981,0x981}, {0x9bc,0x9bc},
> {0x9c1,0x9c4}, {0x9cd,0x9cd}, {0x9e2,0x9e3}, {0xa01,0xa02},
> {0xa3c,0xa3c}, {0xa41,0xa42}, {0xa47,0xa48}, {0xa4b,0xa4d},
> {0xa70,0xa71}, {0xa81,0xa82}, {0xabc,0xabc}, {0xac1,0xac5},
> {0xac7,0xac8}, {0xacd,0xacd}, {0xae2,0xae3}, {0xb01,0xb01},
> {0xb3c,0xb3c}, {0xb3f,0xb3f}, {0xb41,0xb43}, {0xb4d,0xb4d},
> {0xb56,0xb56}, {0xb82,0xb82}, {0xbc0,0xbc0}, {0xbcd,0xbcd},
> {0xc3e,0xc40}, {0xc46,0xc48}, {0xc4a,0xc4d}, {0xc55,0xc56},
> {0xcbc,0xcbc}, {0xcbf,0xcbf}, {0xcc6,0xcc6}, {0xccc,0xccd},
> {0xd41,0xd43}, {0xd4d,0xd4d}, {0xdca,0xdca}, {0xdd2,0xdd4},
> {0xdd6,0xdd6}, {0xe31,0xe31}, {0xe34,0xe3a}, {0xe47,0xe4e},
> {0xeb1,0xeb1}, {0xeb4,0xeb9}, {0xebb,0xebc}, {0xec8,0xecd},
> {0xf18,0xf19}, {0xf35,0xf35}, {0xf37,0xf37}, {0xf39,0xf39},
> {0xf71,0xf7e}, {0xf80,0xf84}, {0xf86,0xf87}, {0xf90,0xf97},
> {0xf99,0xfbc}, {0xfc6,0xfc6}, {0x102d,0x1030}, {0x1032,0x1032},
> {0x1036,0x1037}, {0x1039,0x1039}, {0x1058,0x1059},
> {0x1712,0x1714}, {0x1732,0x1734}, {0x1752,0x1753},
> {0x1772,0x1773}, {0x17b7,0x17bd}, {0x17c6,0x17c6},
> {0x17c9,0x17d3}, {0x17dd,0x17dd}, {0x180b,0x180d},
> {0x18a9,0x18a9}, {0x1920,0x1922}, {0x1927,0x1928},
> {0x1932,0x1932}, {0x1939,0x193b}, {0x20d0,0x20ea},
> {0x302a,0x302f}, {0x3099,0x309a}, {0xfb1e,0xfb1e},
> {0xfe00,0xfe0f}, {0xfe20,0xfe23}, {0x1d167,0x1d169},
> {0x1d17b,0x1d182}, {0x1d185,0x1d18b}, {0x1d1aa,0x1d1ad},
> {0xe0100,0xe01ef},
> };
> ----- Original Message ----- 
> From: "Nadim Shaikli" <shaikli at yahoo dot com>
> To: "Mark Nudelman" <markn at greenwoodsoftware dot com>
> Cc: <bug-less at gnu dot org>; <developer at arabeyes dot org>
> Sent: Tuesday, December 17, 2002 4:07 PM
> Subject: less & composing characters - fix
> > Hi Mark, I'm attaching a patch to 'less-378' which corrects a
> > problem with regards to composing as well as combining characters.
> > Composing characters are characters that are in essence piggy-backed
> > onto preceding characters and get displayed or super-imposed on them.
> > Combining characters are similar except they functionally change the
> > look-n-feel of previous characters and only come into operation if
> > certain combinations of letters are present.  In other words,
> > composing/combining characters get folded into characters that
> > proceeded them and as such 'less' should not account for them as
> > taking up any room on the line/column.
> >
> > Here's an example (so that I don't sound like a lunatic :-)  Assume
> > characters X, Y, Z are composing characters (there are many of them
> > according to the unicode spec) and I have the following sample of
> > text.
> >
> >
> > this is a tesXt, a veXYry simpZle test.
> >
> >  (currently displayed as 39 characters - WRONG)
> >
> >
> > The X, Y, Z characters will be displayed/super-imposed onto the
> > characters they proceeded and so you will end-up with something
> > like this,
> >
> >
> > this is a teSt, a vEry simPle test.
> >
> >  (with patch displayed as 35 characters - CORRECT)
> >
> >
> > Sorry didn't know how else to show it to convey the idea.  So why
> > should 'less' care ?  well 'less' should not count those characters
> > as anything that takes up room on the line (they should be completely 
> > ignored if you will).  In other words, if I have alot of them in a line,
> > I get line-wrapping even though there is plenty of room.
> >
> > I can certainly send you screenshots and actual examples if need be,
> > so don't hesitate to ask.
> >
> > BTW: I highly suggest you treat these multi-byte (mbyte) characters as
> > a single entity (ie. combine their various mutli-byte sequences into
> > a single encoded array entry) to simplify their treatment (I suggest
> > that in the code in my comments).
> >
> > Do please inspect the patch (as it contains a 'hack' with regard to
> > how the code is currently written) and do please consider including it
> > (or something similar) at your earliest convenience.  What's noted works
> > for me without issue.
> >
> > Regards,
> >
> >  - Nadim
> >

Do you Yahoo!?
Yahoo! Mail SpamGuard - Read only the mail you want.