Salam, This is kind of a status report on Duali. The amount of time dedicated to Duali has been anything but consistent, and so I will try to sum up the difficulties I have been facing in the hopes that someone may be able to enlighten me ;) I. Verifying assumptions My goal is to make Duali make a number of assumptions about the Arabic language. But I cannot verify if any number of those assumptions are accurate. I simply don't have the Arabic grammar background (neither do I remember much of what I learned). For example, I make the assumption that any 3-letter word has 14 derivatives. That, provided that the process of stripping the word from any prefix and suffix didn't destroy the word. These are some of the problems, and I am finding more as I refine the dictionary. So in general, I can say that I really could use the help of someone who is proficient in the Arabic grammar that we can work on laying out certain rules. II. The dictionary I was able to produce a dictionary of 3 letter root verb derivatives, but noticed that a couple of terms shouldn't of have been there. I used the word list recently uploaded to CVS to generate the dictionary, which produced 2217 words. The format I want to follow for the dictionary would be something along these lines: TERM:LENGTH:DERIVATIVES where length would be a 4bits and the other 12bits store the possible derivative templates the term could fit. Each term length has a different set of possible derivatives. Of course you can have words that do not have derivatives at all. It will always be set to 0's. III. The process Duali parses the text, grabs each word. Analyzes if it has any prefix or suffix that can be stripped. Then depending on its length sees if it fits any of the derivative templates. If it does, the root of that derivative is looked up in the dictionary. If it's found, it's correct. If not, then it compares the stripped word to the dictionary of the words with no derivatives. If that fails as well, it compares it again but with the full word (pre-analysis). If that too fails, the word is misspelled. So, if a word is tagged incorrect and you know it is correct, what do you do? We can add it to the dictionary, but the problem is, how accurate will we be? Do we always assume that user inputted words are with 0 derivatives? In general, I found that without a proper dictionary in hand at the very least, a lot of the simple should-be-quickie looksup are not possible. Feedback is highly recommended ;) If you are an Arabic expert raise your voice! I could really use some pointers. later -- ------------------------------------------------------- | Mohammed Elzubeir | Visit us at: | | | http://www.arabeyes.org/ | | Arabeyes Project | Homepage: | | Unix the 'right' way | http://fakkir.net/~elzubeir/| ------------------------------------------------------- --- Was I helpful? Let others know: http://svcs.affero.net/rm.php?r=elzubeir
Attachment:
pgp00000.pgp
Description: PGP signature