[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Arabic spellchecker
- To: "Development Discussions" <developer at arabeyes dot org>
- Subject: Re: Arabic spellchecker
- From: "hamed suhli" <hamedsuhli at gmail dot com>
- Date: Wed, 16 Nov 2005 11:10:36 +0200
- Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:from:to:references:subject:date:organization:mime-version:content-type:content-transfer-encoding:x-priority:x-msmail-priority:x-mailer:x-mimeole; b=cSbtEA8hFF5HOsWFsMUMegI0uj6ZzjPmQcfZxNSyOBGsMwOc3CK5ybEUAw8ifnE0T3DjjWCJ4KwAm4PdaCquS0FIztH8pBT3rDWcqVRrCinQXXn5QdEVCYIs2eck4RHvkV0DvOdKy6uIsjzVj41laNHPJCEMhS2KnJqAhxV2uS8=
- Organization: Tarmeez.org
alsalam aleikom
I don't know why we descuss deep arabic language in english
----- Original Message -----
From: "Ahmad Khalifa" <ahmad at khalifa dot ws>
To: <abdalla at pheye dot net>; "Development Discussions" <developer at arabeyes dot org>
Sent: Tuesday, November 15, 2005 9:20 PM
Subject: Re: Arabic spellchecker
This is where its difficulty lies. Defining the AFFIX rules and
writing a *flagged* wordlist.
This is a real problem.
If:
رءى
is the root for:
أريناك
chances for a findig a pragmatical way, or a decent pattern, could be difficult. Not
to mention that the AFFIX rules would be useless, in my humble opinion (don't let me
put you down).
الجذر هو رءي وهو مهموز وناقص وأنا أوافق السيد محمد سمير بعدم إمكانية تجاهل الحركات
أو بشكل أدق الوزن الحقيقي
أريناك=أرى+نا+ك
But consider AFFIX rules augmented with INFIX ?! :)
Not just PREfix, and SUFfix, but also INfix, which is insertion in the middle by means of index. Ofcourse the INFIX approach would
be costly to
adapt, as we'd have to submit patches to Aspell/Myspell and have INFIX
widely accepted.
أنا لا أعتقد أن مفهوم الإضافة وسط الكلمة ستنجح
فهناك إبدال وقلب ...
For fun, consider modern Arabic terms -- one that I can't forget was "maykanat"
(automating). The root is MKN (e.g., wallatheena inn makkannaahum fil ardh...).
Problem is that the yaa comes exactly in the middle of the root. Same goes for
kitaab, the alif comes in the middle of the root. If you could solve such cases,
I would be very much interested to see your work.
لا أعتقد أن مجامع اللغة توافق على هذه الكلمة رغم أن المجمع المصري ضمنها في المعجم الوسيط بالإضافة للكلمة العربية المقابلة
تأليل
وحتى في حال قبلنا الكلمة فهي ليست مشتقة من الجذر العربي م ك ن
بل مجرد تعريب مشتق من الكلمة المختلف على قبولها أيضا مكنة
from machin
The way I see it, we have two options.
1- Add INFIX to the AFFIX rules. That way you can describe KETAB by
flagging the root KTB
2- Add KETAB as an entry of its own beside KTB. That way you can combine
KETAB easily with the 'AL' prefix rule, PLUS you still get only one
entry for the 15 entries of KTB.
هذا التفريع سيقود إلى تضارب بمجرد أن تحاول دراسة السياق والمعنى
التي لا بد منها حتما للتدقيق النحوي عدا عن أن هذا الجهد سيتعارض مع الدراسات المعنوية
التي يفترض في النهاية تضمينها أي محرر نصوص
I am in favour of the second approach. Its faster to adapt, does not
cost much, and would make it easier to define rules for NOUNS.
Its only downside is that for most root verbs that can be derived to
nouns, you get 2 or 3 entries. 1 for the verb and its derivatives, 1 for
the noun KETAB, and one for the MAKTAB noun.
I think 3 entries per root beats 17 entries, no ?
Right now, ammar is working on elzubeir's "Arabic Grammer Rules"
document,
http://cvs.arabeyes.org/viewcvs/projects/duali/doc/arabic-grammar
?????
الصرف العربي
http://www.angelfire.com/tx4/lisan/khamash.htm
I think its the key to developing all the AFFIX rules, as we need to
formally categorize ALL the arabic language words to be able to write
the AFFIX rules.
When the document is finished, we can better estimate the need for INFIX
Please let me know what you think of the two approaches above.
I wish you goodluck insha-Allah.
Thank you.
--
Salam,
Ahmad Khalifa
والسلام
حامد السحلي
www.tarmeez.org