[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Proposal on consistent definition of Unicode Yeh

Dear all,

Let me propose a consistent definition of Unicode Yeh.

Remember first that today's Arabic writing system requires both the 
dotted yeh and the dotless yeh, at least for the final form. We all 
know that Egypt is a major exception, and the reconciliation of these
two situations which is seemingly contradictory is sought for.
It is clear that we need two codepoints. 

Now let us remember the rule: if two graphemes (letters in the simplest
case) look same, then they are the same one; if two look different, 
then they are two. [1]

My idea is to treat this rule as the answer; namely, "dotless yeh"
represents the dotless yeh in any position, in any combination with
superscript/subscript or superscript alif. "Dotted yeh" is the dotted
yeh, initial-position, final- or wherever, which is naturally never
accompanied by hamza, nor by superscript alif.
As a result, U+06CC "Farsi Yeh" which change the policy on dots
depending on its position, becomes unnecessary.

This should essentially be all that Unicode defines, and it is 
the human-computer interface that realizes this rule as naturally
as possible, in accordance with human intuition.

I mean, to bring this definition into life, any qualified Egyptian, 
Iranian/Persian and Qur'aanic input system should be such that 
(a) it has only one Yeh key on the keyboard;
(b) when one types yeh, dotless final form yeh appears on the screen;
(c) when he/she puts a space or break the line, it is left untouched;
(d) but if he/she types another letter following yeh, then it is converted
into dotted yeh;
so that he/she never feels that there're two yeh's. Virtually one yeh
is provided for them.

Similar thing can be said about searching. It is the search function
developers who take care of final dotted and dotless yeh folding.
Such concern should not creep into the Unicode standard, although 
careful annotation is a must in order to suppress possible confusion,
and to let search engines be practical enough.

This solution is simple and contains no room for logical inconsistency,
I believe, and requires the least change to the current Unicode. 
But the scope is limited to modern written Arabic/Persian and modern 
Mushaf of which writing system is well established.

The drawback is clear, however: its impact is not negligible. The raw
X Window System keyboard is not powerful enough, so you need input
methods. Microsoft is notorious (in many ways:) that it is indifferent
to providing a good interface, especially outside of the U.S.
One codepoint of three yeh's will be deprecated. The transition
will require several years.

Well, Japanese-speaking people for example take input methods for
granted, so once accustomed, it cannot be a problem. But it must be a
big step for unfamiliar ones, claiming "o it's so cumbersome!"

So if I am correct, the criterion will be "consistent definition" vs
"transition cost".

Not to forget, "Alef maksura" is a misnomer, and should be called
"dotless yeh". Aaabsolutely. [2]

To be honest, I'm a little hesitant to propose such matter. I'm still
studying introductory Arabic, and don't speak Persian. It would be so
influential. I admit I would be a bit unpleasant if a standard of
letters were forced by foreigners.

But I hope this e-mail would serve to your interests.

Best regards.

[1] Visual distinction is the absolute rule. See for an easy example,
(Originally at http://www.isu.net.sa/archive/ainc-alc/2001-July/000166.html
This message was previously referred to by Mete.)

[2] To note, some introductory materials on Arabic published in Japan call 
the letter "alef maqsura". So does the Wikipedia page on "Arabic script":