To: General Arabization Discussion <general at arabeyes dot org>
Subject: Re: Arabization, techniques and problems
From: Gregg Reynolds <gar at arabink dot com>
Date: Sun, 10 Jul 2005 08:21:50 -0500
User-agent: Mozilla Thunderbird 1.0.2 (Windows/20050317)
maysara a wrote:
Salam all,
Im trying to initiate an open source community in my
university, and i need to intriduce students to open
source and to contribute to already existing software
and documentations and other.
Here's another project you might find interesting. Enhance regular
expression syntax to support Arabic-specific search, and then implement
your ideas in GNU "grep".
Here's a brief example of what I mean. Standard regexes use the
metacharacter "." to mean "match any single character". So a search
pattern like "k.b" will match ktb, krb, etc. but also kab, kub,
k<sukuun>b, etc.
Which is fine; but in Arabic we may want to ignore "stackers" (fatha,
shadda, etc.). So we need another metacharacter that means "match any
non-stacking character". Suppose we use ":" with this meaning. Then
the search pattern "k:b" will match ktb, krb, etc., but *not* kab, kub,
k<sukuun>b, etc.
If you start by asking "what kinds of searches might an Arabic speaker
want to do" and then think about how regexes could make such searches
natural and easy, you can come up with a lot of ideas.