[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[doc] [Fwd: [avrst] Crescent Quran Cor pus]
- To: Documentation and Translation <doc at arabeyes dot org>
- Subject: [doc] [Fwd: [avrst] Crescent Quran Cor pus]
- From: Hamed Al-Suhli <hamed at e3rab dot com>
- Date: Mon, 19 Oct 2009 01:33:31 +0300
-------- Original Message --------
A new version of the *Crescent Quran Corpus* is now freely available online
at http://quran.uk.net. The
corpus contains both morphological and syntactic
annotation of the Quran in Arabic. Previous releases of the corpus focused
on the morphology of Classical Arabic, but this new release now includes an
in-progress syntactic treebank of the Quran. Some new features of this
release of the corpus include:
(1) *Natural Language Generation *(NLG) has been applied to provide
summaries in English of the morphology of each Arabic word of the Quran. For
example:
*The fourth word of verse (21:70) is divided into 4 morphological segments.
A conjunction, verb, subject pronoun and object pronoun. The prefixed
conjunction fa is usually translated as "then" or "so". The perfect verb
(fi3il mad) is first person masculine plural. The verb's root is jim 3ayn
lam (j 3 l). The attached object pronoun is third person masculine plural.*
See http://quran.uk.net/TokenDetail.aspx?location=(21:70:4)
(2) *Syntactic Treebank*. Syntactic annotation of the Quran has been
expanded, using a hybrid dependency / constituency framework, following
traditional Arabic grammar (i'3raab). Syntactic annotation is now available
for chapters 67 to 114. See http://quran.uk.net/Treebank.aspx. Morphological
annotation for all of the Quran with part-of-speech tagging has been
reviewed and improved.
(3) *Quran Java API*. A Quran Java API for the text of the corpus has been
integrated into the website, and is freely available for download.
(4) *Grammar Documentation and Annotation Guidelines*. The website now
includes a comprehensive set of documentation on Arabic dependency grammar
which also serves as set of guidelines for corpus annotators.
(5) *Audio Improvements*. A selection of 10 choices for audio, including an
audio English translation of the text for each verse in the corpus.
(6) *Arabic/English Lexicon of the Quran*. Now includes root counts for each
lexicon entry.
(7) *Improved Visualization*. The website provides improved visualization
for 700 dependency graphs, with better website layout and navigation.
----------------------------------------------------------------------
*Interested in becoming a volunteer annotator?*
We are currently looking for native Arabic speakers to assist in corpus
annotation, and in particular syntactic annotation. The Crescent corpus is
an open source community project with the aim of producing accurate
multi-level annotation of the Quran in classical Arabic, including
morphological and syntactic annotation. The framework adpoted for syntactic
annotation is that of traditional Arabic dependency grammar (i'3raab).
For more information on the corpus please contact the main project
researcher.
Kais Dukes,
School of Computing
University of Leeds
United Kingdom
Soraya Zaidi,
(soraya dot zaidi at univ-annaba dot org)
URL: http://sites.google.com/site/sorayazaidi
GRIA(Groupe de Recherche en Intelligence Artificielle)
LRI(Laboratoire de recherche en Informatique)
(http://www.lri-annaba.net/)
Université Badji Mokhtar Annaba
(http://www.univ-annaba.org/)
Algerie.
--~--~---------~--~----~------------~-------~--~----~
لقد تلقيت هذه الرسالة لأنك مشترك في مجموعات Google مجموعة "مشروع
الترميز المعنوي للغة العربية".
لإرسال هذا إلى هذه المجموعة، قم بإرسال بريد إلكتروني إلى
avrst at googlegroups dot com
لإلغاء الاشتراك في هذه المجموعة، ابعث برسالة إلكترونية إلى
avrst+unsubscribe at googlegroups dot com
لخيارات أكثر، الرجاء زيارة المجموعة على
http://groups.google.com/group/avrst?hl=ar
-~----------~----~----~----~------~----~------~--~---