[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[developers] on searching arabic text



salams,
hope all are well.
was just wondering if someone can share some experiences about
searching in arabic... previously, i used to search user input
(sanitized, but without other modifications) against a tashkeel-free
version of arabic text stored in mysql.

i was recently playing around with lucene and sphinx for improving
non-arabic search and saw huge improvements.  i understand that both
also support arabic (not sure if this is any more than supporting utf8
or not tho).  so i was wondering the following:

1.  does anyone have any experience using lucene or sphinx for arabic
that can tell me how well they perceive they work (esp relative to
simply querying a mysql database)?

2.  can someone tell me what to look out for when it comes to arabic
search?  i usually don't search in arabic, so i don't know - but for
example, i am guessing that i need to make sure that a search for a
word without tashkeel matches the tashkeel version of the word, and
that a search for a word with tashkeel properly matches the word (even
if the tashkeel is off?).  i am also guessing most people will search
without tashkeel, but am not sure.  in general, what kinds of things
should i expect?

wsalams,
-ahmed
-- 
http://whatstheplot.com/blog