[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Volunteers for verifying the quran data
- To: General Arabization Discussion <general at arabeyes dot org>
- Subject: Re: Volunteers for verifying the quran data
- From: Gregg Reynolds <gar at arabink dot com>
- Date: Wed, 29 Jun 2005 10:05:08 -0500
- Cc: "Bernard S. Greenberg at Basis" <bsg2004 at basistech dot com>, Tom Patterson <pattersont at summa dot com>, Zina Saadi <ZinaS at basistech dot com>
- User-agent: Mozilla Thunderbird 1.0.2 (Windows/20050317)
Thomas Milo wrote:
I agree with Mete. This concept of encoding root morphemes separately from
other Arabic letters, if ported to Indo-European languages (much more
I'm a bit busy at the moment so I can't respond in detail. For the
moment all I ask is that you keep an open mind. When I recommend
ignoring Unicode, I mean ignore it *first*, while you are designing an
encoding to meet the needs of a linguistic community; *then* think about
how your encoding can be accomodated by Unicode.
Regarding morphemic encoding: I assume you're talking about the notion
of radical/non-radical pairs as codepoints. In my view, these need not
be considered morphemes. (Turning it around, one could argue that all
Arabic consonants are morphemic.) E.g., assume CAPS are radical
characters and small letters are non-radicals. Then the K, T, and B in
"maKTaB" are not morphemes; they're just letters with radical semantics.
Not so different from encoding both uppercase and lowercase forms for
latin-based scripts. One could spell the same word "maktab"; both would
look the same after rendering, but the former allows use to use ordinary
software to do interesting things (e.g. find all words derived from
KTB). No need for morphological analysis software.
Now, IMO a difficult design question is whether some true morphemes
should in fact be encoded. Obvious examples: definite article, other
particles like laa, sawfa, sa-, direct object suffixes -hu, -ha, etc.
Unicode will never countenance something like that, but that doesn't
mean we shouldn't. Such design decisions should be made strictly on a
costs/benefits basis, IMO.
Even then, Arabic script does not fully cover the Arabic language from a
linguistic perspective. A (or maybe /the/) striking example is the inserted
vowel between the /n/ of tanween and any initial cluster of consonants,
e.g., /muHammadu-ni r-rasuulu/: it has no orthographic expression (I found
it described as kasra, bound to a small nuun in an Ottoman handbook, but I
never attested it in a manuscript).
(I think you mean /muHammadu-nu r-rasuulu/ ;)
I don't understand your argument here. The "helper vowel" can be
inscribed using one of the ordinary vowel marks. (I'm pretty sure the
grammarians address this explicitly.) Scribes may choose not to do
this, but they can if they want to. This occurs in many cases, e.g.
after the question particle hal.