[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Which type of mushaf ins Unicode encoding?



Abdulhaq Lynch wrote:
>> On Saturday 25 June 2005 12:42, Thomas Milo wrote:
>>> Unicode wants to encode writing systems, not conventions within a
>>> writing system nor graphic variantions for the same abstract units
>>> of writing that deal with a particular document.
>>>
>>> In the case of Mushafs, this means that if the same orthographic
>>> unit (grapheme) varies in form between Mushafs, but not in
>>> function. E.g. various instances of regional tamween forms that all
>>> boil down to the exact same thing), propose to encode the
>>> abstraction, do not bother them with calligraphic/typographic
>>> idosyncracies. By the same token, do not encode ras khaa, when it
>>> is a sukun (this one slipped through the net because nobody knew why
>>> it was there). As a first step in digitization we should reduce all
>>> the units of script to their abstract essence and define their
>>> various appearances as regional variations/traditions that can be
>>> dealt with by font technology and text mark-up.
>>>
>>
>> Makes sense.
>>
>> What do you think of my example of the pakistani tanween with small
>> meem, indicating tanween + iqlaab, which from the grapheme point of
>> view is in addition to and offset from the tanween?
>>
(http://kprayertime.sourceforge.net/calligraphy/tanween-dammataan-iqlaab.png
)
>>
>> Doesn't this indicate that iqlaab should be encoded as such, and not
>> incorporated into the tanween?

Well, in my view this is an example of how not to identify graphemes. The
Egyptian and Saudi editions express iqlaab with a ligature of vowel and
small meem, your example shows a tanween ligature with small meem, but the
underlying grapheme is identical: tanween+iqlaab.

The first thing to agree on is to encode iqlaab as a separate grapheme. What
rests then is how to encode tanween. Unicode adopted the tanween ligatures
as separate codes. My opinion is that the ligatures fathatan, dhammatan and
kasratan are not graphemes, but ligatures consisting of exactly what their
Arabic names indicate: two fathas, two dhammas and two kasras.

Now there was the authoritative source that claims there was originally a
single vowel followed by a small or big noon. I consulted another
authoritative source, Dr Gerd-Rüdiger Puin, researcher into the history of
Qur'anic orthography, and he confirmed my observation that the oldest
manuscripts express tanween with two, horizontally aligned vowel signs. This
is also how Yasin Dutton describes them - no trace of a small noon, let
alone a big one. Yet, as a logical device, I like the elegance of the
formula:

[vowel <a/u/i>]+[any tanween <regular/iqlaab/idgaam>
(as many as you can identify; these are the three expressed in the Saudi
orthography, AFAIK)

This structure guarantees searching in existing Unicode-enabled
environments. It also guarantees that modern font technology can take care
of the shapes, whether Pakistani , Egyptian, Saudi or North African. This
approach would mean that on the level of plain text code, Qur'ans remain
identical when they do not conceptually differ and it would make research
into real differences much more efficient.

A simple canonical equivalence insures legacy compatibility with existing
fathatan, dhammatan and kasratan.

t