[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Questions about yeh, hamzah on yeh, alef maksura and dotless ba



Meor Ridzuan Meor Yahaya wrote:
>> On side note, unicode also have Farsi yeh. At first, I though it was
>> strictly for Persian language. But in their document, it does mention
>> Arabic, the language. The characteristic of Farsi yeh is, in
>> initial/medial forms, it exist with dots, otherwise, no dots. More
>> like what it appear in Madinah Mushaf. However, I think it should be
>> kept for Persian Language only.

Meor,

Farsi yeh is another flaw in the Unicode naming system. "Farsi yeh" is
simply tradional Arabic yaa' as used in all mushafs, whether to denote
ii, y or ae. On final forms dots were ornamental, used
indiscrimenately in any of these instances. In the case of retroflex yaa'
(registered in Unicode as U+ 06D2 Yeh Barree, a grapheme for Urdu) dots are
always used in Arabic. As a result, calii and calae both have dots.

In my recently rewritten Arabic tutorial for Unicode Conferences, I gave an
overview of this problem category:
www.decotype.com/publications/unicode-tutorial.pdf (page 7)

Generally speaking, the Unicode standard provides sound graphemic code
points for office  Arabic. However, for Classic and Qur'anic Arabic it only
provides visual patches (such as supercript hamza). Where there are multiple
solutions (e.g., yeh-hamza as a contextual allograph of a single code point
or built from distinct code points), a protocol is lacking.

As for solutions, the only robust way out of the YEH-conundrum is the
encoding of separate dots - throughout. All other solutions will remain what
they are: a confused mess. Such a radical approach would also solve the
YEH-HAMZA ambivalence: U+0649 NODOT-YEH with DOTS BELOW or HAMZA
(ABOVE -BELOW according to context).

A less ambitious, and possibly more practical solution would be to use
regular U+064A YEH throughout, side-by-side with U+0626 YEH-HAMZA. The first
one should drop dots in final position according to a QUR'ANIC LOCALE, the
latter one should then intelligently shift the hamza below according to the
same locale. By the same token, U+0621 hamza should be treated as a
transparent grapheme when used in Qur'anic or Classical Arabic context.

Font technology will have to deal with the resulting positioning issues of
loose hamza between letters are exactly identical to those of SUPERSCRIPT
ALEF when preceded by FATHA. Without such font solutions, Qur'anic encoding
will again remain what it is today: a confused - and incompatible - mess.

This approach, a combination of locales and font technology,  will result in
clean, interchangeable and universally searchable code for Qur'anic Arabic -
with the drawback that its quality will depend on the available local
resources. But that is a general limitation of today's rendering technology
for which solutions are under way.

t