[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: A (too huge) Arabic word-list (with prefixes) for spell-checkers
- To: Mohammed Sameer <msameer at foolab dot org>
- Subject: Re: A (too huge) Arabic word-list (with prefixes) for spell-checkers
- From: Dan Kenigsberg <danken at cs dot technion dot ac dot il>
- Date: Tue, 16 May 2006 13:18:15 +0300
- Cc: Ivrix Discussions <ivrix-discuss at ivrix dot org dot il>, Development Discussions <developer at arabeyes dot org>
- Hebrew-date: 18 Iyyar 5766
- User-agent: Mutt/1.4.1i
Mohammed,
Thank you for your criticism :-)
> Which leads to 2 points:
> 1) Those words are not correct
> 2) The data files contain a small set of incorrect words "Maybe this is a
> problem with my implementation of the Buckwalter algorithm".
> 3) The affix data is huge and IMHO not easy to modify/extend which means
> that it'll be hard to strip those words.
>
> This is why I decided to ignore the Buckwalter data and work on a new data
> set.
Are you sure the best option was to ignore that data? How many incorrect words
spellings are there? Would you please give me an example of an incorrect
spelling of such a word, and the correct one?
I know that the affix data is huge, but please explain what has to be done. Do
you mean that for some words the prefix+stem+suffix is wrong even the stem is
correct?
> I know about a google project to create a dictionary from the Buckwalter data
> which makes me wonder, Why don't you cooperate with them ?
I wouldn't mind. Maybe now one of them approaches me.
> PS. Why "DICT ar EG ar" only in dictionary.lst ? ;-)
I was just trying to be minimalistic here, not to offend non-Egyptians...
--
Dan Kenigsberg http://www.cs.technion.ac.il/~danken ICQ 162180901