[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Arabization, techniques and problems



maysara a wrote:
Salam all,

Im trying to initiate an open source community in my
university, and i need to intriduce students to open
source and to contribute to already existing software
and documentations and other.


Here's another project you might find interesting. Enhance regular expression syntax to support Arabic-specific search, and then implement your ideas in GNU "grep".


Here's a brief example of what I mean. Standard regexes use the metacharacter "." to mean "match any single character". So a search pattern like "k.b" will match ktb, krb, etc. but also kab, kub, k<sukuun>b, etc.

Which is fine; but in Arabic we may want to ignore "stackers" (fatha, shadda, etc.). So we need another metacharacter that means "match any non-stacking character". Suppose we use ":" with this meaning. Then the search pattern "k:b" will match ktb, krb, etc., but *not* kab, kub, k<sukuun>b, etc.

If you start by asking "what kinds of searches might an Arabic speaker want to do" and then think about how regexes could make such searches natural and easy, you can come up with a lot of ideas.

-gregg