[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposal on consistent definition of Unicode Yeh



Waleed, 

Thank you for your questions. I've omitted things to be clarified
first.

> - Is there any complain from Arab/Persian users on the usability of the 
> current implementation?

If you mean their daily usage, casual or business, I guess no.
# But I don't use Arabic in most of my life, so please someone inform.

> - Is there a shortcoming from the current specifications?
So, no in the above sense. At the same time yes in another. Let me explain.

There was a discussion in this general at arabeyes dot org a month ago,
concerning how to encode Qur'aan, or rather a Mushaf, under the subject:
"Questions about yeh, hamzah on yeh, alef maksura and dotless ba". [1]
More than 60 messages were posted, so I don't recommend you to read
them. I describe relevant points here.

# In fact, I simplemindedly assumed you all have read them in my
# previous post. I was wrong.

The logic below relies on the current Unicode, in order to reveal its
confusion. Summary will be given later.

(a) In today's Mushaf (Qur'aan in book form, not recited one), 
yeh is dotted in initial and medial form, and dotless in final
form. So, isn't is proper to encode yeh with U+06CC "arabic letter
farsi yeh"? It is defined to wear and drop dots, just in the same
manner as Qur'aanic yeh. So is today's secular Egyptian yeh. It is
called "farsi", but isn't it really both Persian and Arabic?
(b) You can't blame if an Egyptian encode their language with Farsi
Yeh because it seems to implement "one yeh" correctly, but then
there's a discrepancy with other Arabs who spell the final yeh both
dotted and undotted, according to the context. 
Is the current standard really acceptable, which has indistinguishable
two dotted yeh in ini/med form, U+064A "yeh" and "farsi yeh", and two
dotless yeh in iso/fin form, U+0649 "alif maksura" and "farsi yeh"? 
I would say it may cause troubles.[3]
(c) In Mushaf, hamza under yeh appears. The Yeh is dotless. With which
yeh should it be encoded? There're three candidates: U+0649 "Alef
maksura", U+064A "yeh", and U+06CC "Farsi yeh". 
(d) There's U+0626 "yeh with hamza above", and it is specified to be
equivalent to U+064A "yeh" U+0654 "hamza above". How should these two 
superscript and subscript hamza's after yeh should be related?
(e) In Mushaf, there're occurrences of superscript small alef over
dotless yeh. Again, which yeh should be used? Current 4.1.0 Unicode
says nothing about it, nor gives slightest hints.

To summarize, Qur'aan/modern Egyptian/Persian use a sole
yeh throughout. They consider it as just "yeh". Yeh comes both dotted
and dotless. There's a simple rule when it is dotted and when is not.
On the other hand, other modern Arabic speakers use both dotless and
dotted yeh at the end of words. Such dotless final yeh is sometimes
called "alef maksura", although it is not correct.[2] You cannot
decide automatically which yeh is dotted and which is not only from
the context it appears. Instead you have to know the grammar and each
word. Grammar and lexicon have nothing to do with Unicode.

My view is that distinction of the natural but intuitive concept of
"letter yeh" and grapheme-wise observation lacks in the current
specification. U+064A claims to be natural yeh, but it is
incomplete. U+0649 is a grapheme "dotless yeh", but again, bestowed
half of its right, under a peculiar name "alef maksura".


In the thread referred to formerly [1], historical texts are also
considered. There's more subtlety with them, and my proposal does not
cover them.


Regards.

Oibane
------------------------------
[1]http://lists.arabeyes.org/archives/general/2005/December/msg00006.html

[2]"Alef maksura" is the name of a type of nouns, which often ends in
dotless yeh, with /aa/ sound (elongated /a/). It is wrong to call the
final dotless yeh because: (a) Some alef maksura nouns just end in
plain alif letter. (b) There're other words, say verbs, which end with
dotless yeh and /aa/ sound. 
On the other hand, it is a grammarian's terminology, and you are not
assumed to know it. Not few people casually call it "alif maksura"
today. I was one, until the discussion [1]. As a pragmatism, it may be
too severe to consider it illegitimate.
Anyway, the name "alef maksura" for U+0649 is a source of confusion
and has to be amended. If its name changes to "dotless yeh" and
annotated as "... used as alef maksura in Arabic ...", it may be
acceptable as a compromise. Well, my knowledge is thin, better wait
other's words.

[3] See for an easy example:
http://www.google.com/search?q=cache:r0MGbwnb908J:www.isu.net.sa/archive/ainc-alc/2001-July/000166.html
It is cited in my first post, too.