[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Which type of mushaf ins Unicode encoding?



Thomas Milo wrote:

What do you think of my example of the pakistani tanween with small meem, indicating tanween + iqlaab, which from the grapheme point of view is in addition to and offset from the tanween?


(http://kprayertime.sourceforge.net/calligraphy/tanween-dammataan-iqlaab.png )

Doesn't this indicate that iqlaab should be encoded as such, and not
incorporated into the tanween?


Well, in my view this is an example of how not to identify graphemes. The
Egyptian and Saudi editions express iqlaab with a ligature of vowel and
small meem, your example shows a tanween ligature with small meem, but the
underlying grapheme is identical: tanween+iqlaab.

I would strongly urge you not to construe these as "ligatures". "Ligature" is a term of art in modern computational typography. I don't believe a calligrapher writing a Quran would say a vowel followed by a small meem is a single unit, let alone a ligature. In fact, the language itself indicates this: the operation of iqlaab has nothing to do with the vowel; ditto for the operation of tanween and ikhfaa.



The first thing to agree on is to encode iqlaab as a separate grapheme. What rests then is how to encode tanween. Unicode adopted the tanween ligatures as separate codes. My opinion is that the ligatures fathatan, dhammatan and kasratan are not graphemes, but ligatures consisting of exactly what their Arabic names indicate: two fathas, two dhammas and two kasras.

My understanding is that Unicode does not construe the -atan codepoints as ligatures but as single things. They were adopted because that's the way all the legacy encodings did things

Also I'd be careful about using "grapheme"; it may be the best and most accurate terminology, but that doesn't mean the Unicode crowd accepts it; in fact I predict that if you say "Unicode encodes graphemes" on the Unicode you'll get a lot of howling. "Abstract character" is the unicode way. My own preference is "semantic unit" or the like. Don't look for a lot of logical precision and consistency and simplicity in the language of unicode. :(

-g