[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Arabic spellchecker
- To: Development Discussions <developer at arabeyes dot org>
- Subject: Re: Arabic spellchecker
- From: arn at scs-net dot org
- Date: Thu, 17 Nov 2005 15:52:44 +0200
- User-agent: Internet Messaging Program (IMP) 3.2.1
Question of a beginner in arabic. I feel that the idea of AFFIX or suffix is
more related to latin language. Would it not be possible to work with those
forms that are (as I feel) the basic of arabic. I remember trying to learn of
the first, second form, and those faaEl faEl.... and so on. But of course that
may seems easier when you're a beginner.
Quoting Ahmad Khalifa <ahmad at khalifa dot ws>:
> Abdalla Alothman wrote:
> > Asalamu alaikum.
> > I did something exactly the same way because it was feasible. ;)
> > I agree the approach is far from being organized.
> You mean you already have such a wordlist ? I would be interested
> in taking a look at it, if you don't mind. I would like to see how it
> performs in OO.o.
> >>This is where its difficulty lies. Defining the AFFIX rules and
> >>writing a *flagged* wordlist.
> > This is a real problem.
> > If:
> > رءى
> > is the root for:
> > أريناك
> > chances for a findig a pragmatical way, or a decent pattern, could be
> difficult. Not
> > to mention that the AFFIX rules would be useless, in my humble opinion
> (don't let me
> > put you down).
> But consider AFFIX rules augmented with INFIX ?! :)
> Not just PREfix, and SUFfix, but also INfix, which is insertion in the
> middle by means of index. Ofcourse the INFIX approach would be costly to
> adapt, as we'd have to submit patches to Aspell/Myspell and have INFIX
> widely accepted.
> > For fun, consider modern Arabic terms -- one that I can't forget was
> > (automating). The root is MKN (e.g., wallatheena inn makkannaahum fil
> > Problem is that the yaa comes exactly in the middle of the root. Same goes
> > kitaab, the alif comes in the middle of the root. If you could solve such
> > I would be very much interested to see your work.
> The way I see it, we have two options.
> 1- Add INFIX to the AFFIX rules. That way you can describe KETAB by
> flagging the root KTB
> 2- Add KETAB as an entry of its own beside KTB. That way you can combine
> KETAB easily with the 'AL' prefix rule, PLUS you still get only one
> entry for the 15 entries of KTB.
> I am in favour of the second approach. Its faster to adapt, does not
> cost much, and would make it easier to define rules for NOUNS.
> Its only downside is that for most root verbs that can be derived to
> nouns, you get 2 or 3 entries. 1 for the verb and its derivatives, 1 for
> the noun KETAB, and one for the MAKTAB noun.
> I think 3 entries per root beats 17 entries, no ?
> Right now, ammar is working on elzubeir's "Arabic Grammer Rules"
> I think its the key to developing all the AFFIX rules, as we need to
> formally categorize ALL the arabic language words to be able to write
> the AFFIX rules.
> When the document is finished, we can better estimate the need for INFIX
> Please let me know what you think of the two approaches above.
> > I wish you goodluck insha-Allah.
> Thank you.
> Ahmad Khalifa
> Developer mailing list
> Developer at arabeyes dot org