[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Questions about yeh, hamzah on yeh, alef maksura and dotless ba



Hello Meor,

----- Original Message ----
<<From: Meor Ridzuan Meor Yahaya <meor dot ridzuan at gmail dot com>
1. The usage of 649 in it's final form should always represent alef
maqsura, so we can easily look for it. For other dotless final yeh, we
can use farsi yeh for it, or even 64A, with a Locale system attach to
it. But for now, maybe we can keep it as Farsi yeh.>>

I suggest not to use 649 since it is an unnecessary character - Farsi yeh covers it. IMHO it should not have entered Unicode in the first place, but it was probably carried over to Unicode from legacy ISO Arabic encoding. (and hopefully the name of Farsi yeh can be changed such that it is Farsi and Classical Arabic - and possibly more - yeh).

<<2. 626 should be used. This will make it easier and more
understandable, because we know what 626 is. If we encode it as 649 +
hamza above/below, someone might mistakenly think the 649 is alef
maksura, which in this case, definately not.>>

I strongly suggest not to use 626 but rather use the seperate hamza above/below codepoint. This is better normalization of text. Besides you have to use a seperate small alef anyways. So use both a seperate hamza above/below and a seperate small alef for consistency. Did I tell you this was better for normalization? :)

<<3. Now, we are left with dotless yeh with small alef in the initial
and medial form. From previous mail, the suggestion was to use 649 +
670. Of course, visually, it is easy to tell that this is not alef
maksura, but rather a dotless yeh serve as the chair for small alef.
However, to develop an algorithm to search for it, it is not as
easy/straight forward.  I think that is why someone was sugesting  to
me to use dotless ba instead of 649. Any suggestion?>>

Dotless beh is a non-starter for this purpose. It is what it is; it is a dotless "beh". It is intended for an archaic ambigious beh/teh/theh/yeh character. The seat of small alef is not an ambigious character, it is dotless but it is a "yeh". The algorithm for searching a small alef with dotless yeh chair is simply searching for the code sequence yeh+superscript_alef.

<<Anyway, to make no 1 happen, I need to have some word list initially
so that I can look for the word, and make the necessary changes.
First, I probaly change all final yeh (of course, all are dotless) to
farsi yeh ATM, then change the necessary word to use 649. After that
being done, maybe all occurance of yeh can/should be change to Farsi
yeh, just to make it consistent. For no 2, should not be a problem for
me to change all. Just need to work on no 3. Maybe at the moment, I
can go ahead with dotless ba. Later, if someone can come up with a
better solution, I can change it back. This will be easy because there
is no other use of dotless ba anywhere.>>

My recommendation is to convert all yehs - alef maqsuras, yeh seats of hamza, yeh seats of small alef, regular yehs, final dotless yehs - to Farsi yeh. Searching is no problem. Here is the algorithm:

If you're looking for alef maqsura or more properly a final dotless yeh that is pronounced like alef, look for:
fatha+farsi_yeh at the end of word, fathatah+farsi_yeh at the end of word, farsi_yeh+superscript_alef at the end of word

If you're looking for a yeh seat of hamza, look for:
farsi_yeh+hamza_above, farsi_yeh+hamza_below

If you're looking for a yeh seat of small alef, look for:
farsi_yeh+small_alef

If you're looking for final dotless yehs, look for:
farsi_yeh at the end of word

It should be pretty straight-forward, thanks to the immensely vocalized Fahd/Madinah Mushaf.

Regards,
Mete