Duali (spell-checker) - Help Needed


As many of you know, I have been working on Duali

However, I am having several problems all related to Arabic grammar
rules (altasreef, to be specific).

I have posted on this subject on the 'developer' list, I think it would
be best to read them to know what is needed:


I have uploaded some little script to generate a dictionary based on
a word-list (the same one that was uploaded to CVS some time ago).
Originally the code did remove prefix and suffix letters from the root,
but I removed those parts as they completely mess things up. So,
right now it just puts them in a list of non roots (for later).

Arabeyes CVS: projects/duali/src/tools

You will find 'gendic.py' which is the script you can run. The output

$ gendic -f dict_wordlist -r

will look something like


Where l is the the length of the original word (before it was reduced
to its root), and the d is the derivative type. This you can look into
the aralex.py code to see the regular expressions (m1,m2,etc.) for.
They are simply an indication of which rule was used to derive the root

If you instead pass it the -n (instead of -r) you will get the words
that could not be fit into a root. Those are generally words that have
longer prefix and suffix letters.. Simply ignore them for now.

Please notice how the output comes out, and tell me what you see as wrong.
I know there are lots of problems, but I would like to have them listed
so I can deal with them.. and certainly my eyes alone are not doing the job ;)

Thank you.
