[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Regular Expressions

--- Abdalla Alothman <abdalla at pheye dot net> wrote:
> On Sunday 10 July 2005 16:21, Gregg Reynolds wrote:
> > Standard regexes use the  metacharacter "." to mean "match any
> > single character".  So a search  pattern like "k.b" will match
> > ktb, krb, etc. but also kab, kub, k<sukuun>b, etc.
> >
> > Which is fine; but in Arabic we may want to ignore "stackers" (fatha, 
> > shadda, etc.).  So we need another metacharacter that means "match any 
> > non-stacking character".  Suppose we use ":" with this meaning.  Then 
> > the search pattern "k:b" will match ktb, krb, etc., but *not* kab, kub, 
> > k<sukuun>b, etc.
> Alternatively, you can  put all those marks in a  C++ string, and call
> the find_first_of()  member to filter  out those characters.  With C++
> there  are  many  other  ways  to  do  it  using  the  STL  algorithms
> (transform, replace, etc. you can add predicates that fit your needs.)

A couple of comments in passing,

 a. Perl has pretty good unicode support, how does it handle all of this ?
    It should be pretty simple to test (I'll try to do this when I get
    a chance) for reference.  Just to know that status of things now
    (I'd guess that composers are simply treated like normal characters,
    so a '.' search would include harakat).

 b. This kinda-of relates to 'diff'ing files as well.  What I'm getting at
    is there will be instances when harakat (or composing characters in
    general) ought to be ignored and should simply be passed over as though
    they don't exist at all.  I would think this would be best handled via
    an environmental variable (export IGNORE_COMPOSERS=1 or similar).  So
    when you enable this variable the composers are ignored in all apps
    where searching/regex is used.  This seems like a simple/feasible idea
    to me that should be _very_ simple to implement, the only issue is
    how to make something like this a standard that other applications
    know about and follow.


 - Nadim

Start your day with Yahoo! - make it your home page