Salam, So I have finally gotten my hands on a proper Arabic dictionary, issued by none but the ALECSO (of the Arab League), 1989. Once I started browsing through the contents of the dictionary, I realized that my proposal for a dictionary generator is inherently flawed. I am rather disappointed that no one on any of the lists has ever bothered to look up into a dictionary and point out how a regular book-type dictionary would be organized. I have repeatedly pleaded that I do not have the resources to make some of the simpler, seemingly obvious distinctions between workable and impossible. For example, my proposal did not take into consideration the distinction between verbs and nouns. It is possible to narrow down the possibilities if the word is supplied with adequate 'harakat'. Without them, it is simply impossible to have a program guess if a sequence of characters makes up a noun or a verb. The dictionary I have in hand right now, puts them in their roots. Words that are a direct derivative of the roots are listed, but some which are not directly derived (yet share the first 3-5 first letters) are also listed under it. This gives a nice little insight as to how the spell-checker was intended to load the dictionary content into memory, but it was not how I intended to create it. It is much harder to reverse the process. The dictionary (book) has around 25,000 entries, that is, roots. I have come to the conclusion that Duali is simply not possible without an extensive data entry process. The following tasks are open for any takers: + Five data entry volunteers -- to enter a complete list of 25,000 roots, plus a number representation of their possible derivatives + One Arabic linguist/or even hobbyist -- to provide feedback and listings of every possible derivative from any given word. That not only includes the common but the rare as well. Most of those possibilities are already listed, but the rarer ones are not. So, as you can tell, this is not a small project. Letting a script create a dictionary is simply not possible unless you want a toy spell checker. That is not something I had in mind ;) If anyone is interested, please do not be shy! Reply immediately. Thank you. -- ------------------------------------------------------- | Mohammed Elzubeir | Visit us at: | | | http://www.arabeyes.org/ | | Arabeyes Project | Homepage: | | Unix the 'right' way | http://fakkir.net/~elzubeir/| ------------------------------------------------------- --- Was I helpful? Let others know: http://svcs.affero.net/rm.php?r=elzubeir
Attachment:
pgp00000.pgp
Description: PGP signature