[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: more on Duali
- To: developer at arabeyes dot org
- Subject: Re: more on Duali
- From: Nadim Shaikli <shaikli at yahoo dot com>
- Date: Tue, 13 Aug 2002 12:53:20 -0700 (PDT)
- Cc: Kareem Darwish <kareem at glue dot umd dot edu>
On Tue, 13 Aug 2002 08:42:17 -0500,
"Mohammed Elzubeir" <elzubeir arabeyes org> wrote:
>
> I want to remove the prefix from a word. FEH is a possible prefix..
> for instance 'ftl3b' (fatal3ab) [FEH, TEH, LAM, AIN, BEH]. But so is
> the TEH, so I can go with FEH,TEH as the prefix. That's all good and
> fine, as I can verify that after removing the prefix the rest of the
> word's length must be at least 3.
>
> what about words like 'faqd' (faqid) [FEH, ALEF, QAF, DAL]. It's over 3
> letters in length, and the FEH satisfies the requirement to be a prefix..
> we remove that and we end up with a non-word.
>
> I've been working on this for some time now that I can't think straight
> anymore. what would suggestions be? Should I rely on a set of root verbs that
> serve as a basis (as in, compare to the list before anything else)? Part of
> the idea for Duali is that you should be able to generate a dictionary very
> easily and quickly. For example, one should be able to run the dictionary
> generator over a trusted pieces of text in a given field (e.g. medical) --
> and you would result with a 'medical-friendly' dictionary, etc.
I'm certainly no expert in this field (its been way TOO long since I've
looked/learned all these interesting rules), but I would tend to think
that generating a list of root verbs (be it 3 characters or more even)
and define a set of rules for prefix and suffix ought to work. One thing
to note - if memory serves there are lots of exceptions to all the rules
in Arabic grammar and so I would not be surprised if the prefix/suffix
rules you come up with will not hold true for all the "root" verbs; meaning
there will likely be 4-5 prefix/suffix groupings and so it then becomes an
exercise in data-collection (ie. how to generate this root verb list and
groupings). Once the basics are in place, I'd suggest grabbing a number
of Arabic websites' data and running duali on it in order to generate a
wider data set.
My $0.02's worth.
- Nadim
__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com