[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: more on Duali
- To: Mohammed Elzubeir <elzubeir at arabeyes dot org>
- Subject: Re: more on Duali
- From: Kareem M Darwish <kareem at Glue dot umd dot edu>
- Date: Thu, 15 Aug 2002 07:24:54 -0400 (EDT)
- Cc: developer at arabeyes dot org
AA,
I am currently out of town, so I can't help you immediately. I
have stemmer (written in Perl) that removes common prefixes and suffixes
without any linguistic knowledge and I have a full morphological analyzer
that would attempt to produce the exact stem (removing prefixes and
suffixes). Depending on the application, one might be better than the
other. For information retrieval (search engine) purposes the stemmer
that uses NO linguistic knowledge is MUCH better.
You can find both at:
www.glue.umd.edu/~kareem/research
If you have more questions just e-mail me.
Kareem
On Tue, 13 Aug 2002, Mohammed Elzubeir wrote:
>
> Salam,
>
> Looks like most of my posts re: duali fall into the trap of silence ;)
>
> Okay, here is the biggest problem I have right now. The rest seems
> trivial to this.
>
> I want to remove the prefix from a word. FEH is a possible prefix..
> for instance 'ftl3b' (fatal3ab) [FEH, TEH, LAM, AIN, BEH]. But so is
> the TEH, so I can go with FEH,TEH as the prefix. That's all good and fine, as
> I can verify that after removing the prefix the rest of the word's length must
> be at least 3.
>
> what about words like 'faqd' (faqid) [FEH, ALEF, QAF, DAL]. It's over 3
> letters in length, and the FEH satisfies the requirement to be a prefix.. we
> remove that and we end up with a non-word.
>
> I've been working on this for some time now that I can't think straight
> anymore. what would suggestions be? Should I rely on a set of root verbs that
> serve as a basis (as in, compare to the list before anything else)? Part of
> the idea for Duali is that you should be able to generate a dictionary very
> easily and quickly. For example, one should be able to run the dictionary
> generator over a trusted pieces of text in a given field (e.g. medical) -- and
> you would result with a 'medical-friendly' dictionary, etc.
>
> I am officially stuck at this point. It seemed easy during the design, but
> once I saw what the output was I am now having to re-think this whole thing
> over. Please, don't hesitate to give your input.. I've been talking to myself
> on this subject since we started.
>
> later
> --
> -------------------------------------------------------
> | Mohammed Elzubeir | Visit us at: |
> | | http://www.arabeyes.org/ |
> | Arabeyes Project | Homepage: |
> | Unix the 'right' way | http://fakkir.net/~elzubeir/|
> -------------------------------------------------------
> ---
> Was I helpful? Let others know:
> http://svcs.affero.net/rm.php?r=elzubeir
>