[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Proposal for the Basis of a Codepoint Extension toUnicodeforthe Encoding of the Quranic Manuscripts
- To: "General Arabization Discussion" <general at arabeyes dot org>
- Subject: Re: Proposal for the Basis of a Codepoint Extension toUnicodeforthe Encoding of the Quranic Manuscripts
- From: "Thomas Milo" <t dot milo at chello dot nl>
- Date: Wed, 22 Jun 2005 10:08:00 +0200
Gregg Reynolds wrote:
>>> following formula, that I hope this community will endorse:
>>>
>>> tanween = <vowel> <vowel> + [optional] <modifier>
>>>
>>> <vowel>= fatha / dhamma / kasra
>>> <modifier>= tamweem / sequentializer
>>>
>>>
>>> For backward compatibility,
>>>
>>> <vowel> <vowel> = fathatan / dhammatan / kasratan
>>>
>> Hmm. In my opinion, it would be both more useful and more accurate
>> historically to simply have a couple of TANWEEN codepoints. If I'm
>> not mistaken, tanween was originally marked using a small nuun and
>> later evolved into the doubled vowel mark.
Historically speaking, I do not agree. I have never seen a trace of a small
nuun. They earlies markers were horizontally repeated coloured vowel dots
(see: Yasin Dutton).
Linguistically speaking I agree that the basic i`raab can be followed by one
of three modulations like you indicate below.
>> For example, using latin-1:
>>
>> TANWEEN = ñ
>> TANWEEN IDGHAM = Ñ
>> TAMWEEM = %
>>
>> Examples (x = kha, ç = sheen, ² = shadda):
>>
>> kitaabuñ
>> xuçubuÑ m²usan²ada#uÑ
>> min% ba at d
BTW, I designed a computer-aided, reversible transcription system (with
fall-back transliteration) which you can download for evaluation from Basis
Technology: http://www.basistech.com/arabic-editor/)
In that transcription your first sample reads as follows:
kitaabu-n (DMG: kitābu-n)
The qur'anic assimilation of second one is not yet supported, but it will
read like this:
khushubu- m:usannadätu-n (DMG: ḫušubu- m:usannadätu-n)
As you can see, initial compensatory shaddä is treated differently from
morphological shaddä.
>> Now search and sort works much better, and the rendering isn't all
>> that hairy. Edit logic should also be simpler.
>>
>> I wouldn't advise equating pairs of vowel marks with tanween marks at
>> the level of encoding design.
What's the objection? It would be just as transparent as you solution.
Anyway, I like your approach. If it is to find any acceptance, there needs
to be canonical equivalence with legacy encoding accoding to this formula:
TANWEEN = <vowel><small noon>
= conventional tanween
TAMWEEM = <vowel><small meem>
IDGHAM = <vowel><idgham code>
Note that this is different - and better - than Meor's and my earlier
suggestion to retain full tanween followed by a modulation mark.
t