[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Volunteers for verifying the quran data



Thomas Milo wrote:
I agree with Mete. This concept of encoding root morphemes separately from
other Arabic letters, if ported to Indo-European languages (much more
...etc...

Hi,

I'm a bit busy at the moment so I can't respond in detail. For the moment all I ask is that you keep an open mind. When I recommend ignoring Unicode, I mean ignore it *first*, while you are designing an encoding to meet the needs of a linguistic community; *then* think about how your encoding can be accomodated by Unicode.

Regarding morphemic encoding: I assume you're talking about the notion of radical/non-radical pairs as codepoints. In my view, these need not be considered morphemes. (Turning it around, one could argue that all Arabic consonants are morphemic.) E.g., assume CAPS are radical characters and small letters are non-radicals. Then the K, T, and B in "maKTaB" are not morphemes; they're just letters with radical semantics. Not so different from encoding both uppercase and lowercase forms for latin-based scripts. One could spell the same word "maktab"; both would look the same after rendering, but the former allows use to use ordinary software to do interesting things (e.g. find all words derived from KTB). No need for morphological analysis software.

Now, IMO a difficult design question is whether some true morphemes should in fact be encoded. Obvious examples: definite article, other particles like laa, sawfa, sa-, direct object suffixes -hu, -ha, etc. Unicode will never countenance something like that, but that doesn't mean we shouldn't. Such design decisions should be made strictly on a costs/benefits basis, IMO.


Even then, Arabic script does not fully cover the Arabic language from a linguistic perspective. A (or maybe /the/) striking example is the inserted vowel between the /n/ of tanween and any initial cluster of consonants, e.g., /muHammadu-ni r-rasuulu/: it has no orthographic expression (I found it described as kasra, bound to a small nuun in an Ottoman handbook, but I never attested it in a manuscript).

(I think you mean /muHammadu-nu r-rasuulu/ ;)

I don't understand your argument here. The "helper vowel" can be inscribed using one of the ordinary vowel marks. (I'm pretty sure the grammarians address this explicitly.) Scribes may choose not to do this, but they can if they want to. This occurs in many cases, e.g. after the question particle hal.

-g