[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Quran data and issues in encoding the Quran in unicode



Dear Meor,

I have looked at your summary of Contemporary Qur'an Orthography (CQO)
bottlenecks. Thank you for this comprehensive overview. Here are my
comments:

Sequential tanween
As I mentioned in our private discussion, I decided to encode sequential
tanween as what they are: a sequence of fathat-fatha; dhamma-dhamma,
kasra-kasra. This has the advantage that is bypasses Unicode red tape. If
someone has a better idea, I would welcome it.

Small letters
Small Alif between base letters
This is a pure rendering issue. This is a case of left off-set position of
U+0670 superscript alef that is conditioned by a preceeding fatha. The extra
spacing or ise of supporting keshide is a matter of taste, font design and
technology. It is certainly not part of any tradition: there is no trace of
it in old handwritten mushafs.

Small waw
Your example /li yasuu'uw/ shows a non-spacing superscript waw. Like 06e8
superscript noon it occurs only once and by analogy it deserves its own
Unicode point.
The example if small waw in /innahuu/ is linguistically totally regular and
predictable and has normal unicode support.

Small seen
Obviously the corrective small seen in Q2:245 /yabsuTu / is a totally
different grapheme (unit of script) than the superscript cantillation mark.
The positioning as such can be dealt with by font technology. On the other
hand, cantilattion marks could be considered supralinear annotation to be
positioned on a secundary baseline. In that case they need to be encoded as
such, which has ethe added value that the corrective small seen and the seen
of saktah are recognized as different graphemes

hamza and lam-alif
The whole problems boils down to only one thing: in addition to hamza
supported by various chairs and there is hamza, plain and simple. This
u+0621 hamza is misrepresented by the Unicode shaping algorithm as
non-connecting. In fact, it is not at all non-connecting, it is transparent
to its surrounding letters. All you examples illustrate this, with or
without extra lengths of connecting keshide. BTW, since the original Uthmani
version was bound by rules of Arabic calligraphy, it is very likely that
wherever a chairless hamza is supported by a typographic keshide (i.e., one
that breaks the rules of classical calligraphy), this hamza was added later.

The ligature shaping issues with lam alif fall in the domain of font
technology.

Regards,

t




Meor Ridzuan Meor Yahaya wrote:
> First of all, I would like to inform that I've created a new Quran
> unicode data, complete with diacritics marks acording to Madinah
> Mushaf. The file have not been verified yet, so volunteers are
> welcome.
>
> Second, I remember last time there was an initiatives to submit a
> proposal to unicode. What have happen to the initiative? I've compiled
> my own issues regarding encoding and displaying the Quran in unicode.
> Appreciate comments and feedback from fellow arabeyes on the
> documents, especially those who are expert in Quranic Rasm.
>
> The above said documents can be downloaded from
> http://www.pakistanopensource.org/projects/quran/ .
>
> Regards.
>
>
>
>
>> _______________________________________________
>> General mailing list
>> General at arabeyes dot org
>> http://lists.arabeyes.org/mailman/listinfo/general