[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Arabic spellchecker



alsalam aleikom
I don't know why we descuss deep arabic language in english

----- Original Message ----- From: "Ahmad Khalifa" <ahmad at khalifa dot ws>
To: <abdalla at pheye dot net>; "Development Discussions" <developer at arabeyes dot org>
Sent: Tuesday, November 15, 2005 9:20 PM
Subject: Re: Arabic spellchecker



This is where its difficulty lies. Defining the AFFIX rules and
writing a *flagged* wordlist.
This is a real problem.
If:
رءى
is the root for:
أريناك
chances for a findig a pragmatical way, or a decent pattern, could be difficult. Not
to mention that the AFFIX rules would be useless, in my humble opinion (don't let me
put you down).

الجذر هو رءي وهو مهموز وناقص وأنا أوافق السيد محمد سمير بعدم إمكانية تجاهل الحركات
أو بشكل أدق الوزن الحقيقي
أريناك=أرى+نا+ك

But consider AFFIX rules augmented with INFIX ?! :)
Not just PREfix, and SUFfix, but also INfix, which is insertion in the middle by means of index. Ofcourse the INFIX approach would be costly to
adapt, as we'd have to submit patches to Aspell/Myspell and have INFIX
widely accepted.


أنا لا أعتقد أن مفهوم الإضافة وسط الكلمة ستنجح
فهناك إبدال وقلب ...
For fun, consider modern Arabic terms -- one that I can't forget was "maykanat"
(automating). The root is MKN (e.g., wallatheena inn makkannaahum fil ardh...).
Problem is that the yaa comes exactly in the middle of the root. Same goes for
kitaab, the alif comes in the middle of the root. If you could solve such cases,
I would be very much interested to see your work.

لا أعتقد أن مجامع اللغة توافق على هذه الكلمة رغم أن المجمع المصري ضمنها في المعجم الوسيط بالإضافة للكلمة العربية المقابلة
تأليل
وحتى في حال قبلنا الكلمة فهي ليست مشتقة من الجذر العربي م ك ن
بل مجرد تعريب مشتق من الكلمة المختلف على قبولها أيضا مكنة
from machin

The way I see it, we have two options.
1- Add INFIX to the AFFIX rules. That way you can describe KETAB by
   flagging the root KTB
2- Add KETAB as an entry of its own beside KTB. That way you can combine
   KETAB easily with the 'AL' prefix rule, PLUS you still get only one
   entry for the 15 entries of KTB.

هذا التفريع سيقود إلى تضارب بمجرد أن تحاول دراسة السياق والمعنى
التي لا بد منها حتما للتدقيق النحوي عدا عن أن هذا الجهد سيتعارض مع الدراسات المعنوية
التي يفترض في النهاية تضمينها أي محرر نصوص

I am in favour of the second approach. Its faster to adapt, does not
cost much, and would make it easier to define rules for NOUNS.
Its only downside is that for most root verbs that can be derived to
nouns, you get 2 or 3 entries. 1 for the verb and its derivatives, 1 for
the noun KETAB, and one for the MAKTAB noun.
I think 3 entries per root beats 17 entries, no ?

Right now, ammar is working on elzubeir's "Arabic Grammer Rules"
document,
http://cvs.arabeyes.org/viewcvs/projects/duali/doc/arabic-grammar

?????
الصرف العربي
http://www.angelfire.com/tx4/lisan/khamash.htm

I think its the key to developing all the AFFIX rules, as we need to
formally categorize ALL the arabic language words to be able to write
the AFFIX rules.

When the document is finished, we can better estimate the need for INFIX

Please let me know what you think of the two approaches above.

I wish you goodluck insha-Allah.

Thank you.

--
Salam,
Ahmad Khalifa

والسلام حامد السحلي www.tarmeez.org