[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Duali - On Dictionaries



Salam,

So I have finally gotten my hands on a proper Arabic dictionary,
issued by none but the ALECSO (of the Arab League), 1989. Once
I started browsing through the contents of the dictionary,
I realized that my proposal for a dictionary generator is inherently
flawed. 

I am rather disappointed that no one on any of the lists has
ever bothered to look up into a dictionary and point out how a
regular book-type dictionary would be organized. I have repeatedly
pleaded that I do not have the resources to make some of the simpler,
seemingly obvious distinctions between workable and impossible.

For example, my proposal did not take into consideration the distinction
between verbs and nouns. It is possible to narrow down the possibilities
if the word is supplied with adequate 'harakat'. Without them, it is
simply impossible to have a program guess if a sequence of characters
makes up a noun or a verb.

The dictionary I have in hand right now, puts them in their roots.
Words that are a direct derivative of the roots are listed, but some
which are not directly derived (yet share the first 3-5 first letters)
are also listed under it.

This gives a nice little insight as to how the spell-checker was
intended to load the dictionary content into memory, but it was not
how I intended to create it. It is much harder to reverse the process.

The dictionary (book) has around 25,000 entries, that is, roots. I have
come to the conclusion that Duali is simply not possible without an
extensive data entry process.

The following tasks are open for any takers:

+ Five data entry volunteers -- to enter a complete list of 25,000
  roots, plus a number representation of their possible derivatives

+ One Arabic linguist/or even hobbyist -- to provide feedback and
  listings of every possible derivative from any given word. That
  not only includes the common but the rare as well. Most of those
  possibilities are already listed, but the rarer ones are not.

So, as you can tell, this is not a small project. Letting a script
create a dictionary is simply not possible unless you want a toy
spell checker. That is not something I had in mind ;)

If anyone is interested, please do not be shy! Reply immediately.

Thank you.
-- 
-------------------------------------------------------
| Mohammed Elzubeir    | Visit us at:                 |
|                      |  http://www.arabeyes.org/    |
| Arabeyes Project     | Homepage:                    |
| Unix the 'right' way |  http://fakkir.net/~elzubeir/|
-------------------------------------------------------
---
Was I helpful? Let others know:
http://svcs.affero.net/rm.php?r=elzubeir

Attachment: pgp00000.pgp
Description: PGP signature