[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[developers] on searching arabic text
- To: Development Discussions <developer at arabeyes dot org>
- Subject: [developers] on searching arabic text
- From: ahmedre <ahmed at piousity dot net>
- Date: Wed, 28 Jan 2009 14:17:39 -0800
salams,
hope all are well.
was just wondering if someone can share some experiences about
searching in arabic... previously, i used to search user input
(sanitized, but without other modifications) against a tashkeel-free
version of arabic text stored in mysql.
i was recently playing around with lucene and sphinx for improving
non-arabic search and saw huge improvements. i understand that both
also support arabic (not sure if this is any more than supporting utf8
or not tho). so i was wondering the following:
1. does anyone have any experience using lucene or sphinx for arabic
that can tell me how well they perceive they work (esp relative to
simply querying a mysql database)?
2. can someone tell me what to look out for when it comes to arabic
search? i usually don't search in arabic, so i don't know - but for
example, i am guessing that i need to make sure that a search for a
word without tashkeel matches the tashkeel version of the word, and
that a search for a word with tashkeel properly matches the word (even
if the tashkeel is off?). i am also guessing most people will search
without tashkeel, but am not sure. in general, what kinds of things
should i expect?
wsalams,
-ahmed
--
http://whatstheplot.com/blog