On Tue, May 16, 2006 at 11:28:18AM +0300, Dan Kenigsberg wrote: > Hello, > > I have converted Tim Buckwalter's database of Arabic - including all suffixes > and prefixes - to the Aspell format. This makes it possible to spell-check > Arabic in Aspell, Mozilla Thunderbird, and OpenOffice - if you can spare some > 200Mb of RAM. > > It is freely available under the GPL, on > http://ivrix.org.il/projects/arabic . > Hi Dan, Note: I'm not criticizing, I'm just stating facts and trying to be constructive Originally, I worked on a fork of Duali, The Arabic spell checker by M. Elzubeir. I then realized that aspell can do it and that we do not need an Arabic spell checker. I had a look at the Buckwalter data since it was the data set for both spell checkers. Something we didn't notice before we didn't have a working spell checker implementation. The data set contains words from the Holy Quran, The words in the Holy Quran are sometimes spelled in a different way due to the script used to write the Quran. Those words are incorrect outside the Quran context. Which leads to 2 points: 1) Those words are not correct 2) The data files contain a small set of incorrect words "Maybe this is a problem with my implementation of the Buckwalter algorithm". 3) The affix data is huge and IMHO not easy to modify/extend which means that it'll be hard to strip those words. This is why I decided to ignore the Buckwalter data and work on a new data set. I know about a google project to create a dictionary from the Buckwalter data which makes me wonder, Why don't you cooperate with them ? PS. Why "DICT ar EG ar" only in dictionary.lst ? ;-) Best wishes, -- GNU/Linux registered user #224950 Proud Egyptian GNU/Linux User Group <www.eglug.org> Member. Life powered by Debian, Homepage: www.foolab.org -- Don't send me any attachment in Micro$oft (.DOC, .PPT) format please Read http://www.gnu.org/philosophy/no-word-attachments.html Preferable attachments: .PDF, .HTML, .TXT Thanx for adding this text to Your signature
Attachment:
signature.asc
Description: Digital signature