[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Questions about yeh, hamzah on yeh, alef maksura and dotless ba



Mete Kural wrote:


I suggest not to use 649 since it is an unnecessary character - Farsi yeh covers it. IMHO it should not have entered Unicode in the first place, but it was probably carried over to Unicode from legacy ISO Arabic encoding.

I don't understand the objection, nor the assertion that it is "unnecessary". Farsi yeh and 064A have exactly the same "orthosemic" ;) semantics, in my view. Otherwise, you would have to argue that Farsi yeh has unstable semantics - it means one thing when it has dots, and when it doesn't have dots, it may mean either of two things, but you can't know which one from the encoding itself, you have to analyze the word. That's not good - the objective should be to pack as much info as possible into the encoding.


Dotless yeh (0649) is absolutely essential - it is the only way to encode a meaningless (purely surface) dotless yeh, which is an essential part of Arabic orthography - 0649 as seat of hamza and as final alef. (Whether that alef is maqsura or not is a matter of grammatical analysis, beyond the scope of an encoding design.)

(and hopefully the name of Farsi yeh can be changed
such that it is Farsi and Classical Arabic - and possibly more -
yeh).

<<2. 626 should be used. This will make it easier and more understandable, because we know what 626 is. If we encode it as 649 +
hamza above/below, someone might mistakenly think the 649 is alef maksura, which in this case, definately not.>>


I strongly suggest not to use 626 but rather use the seperate hamza
above/below codepoint. This is better normalization of text. Besides
you have to use a seperate small alef anyways. So use both a seperate
hamza above/below and a seperate small alef for consistency. Did I
tell you this was better for normalization? :)

Here I agree with you. I don't think there's a risk of confusing 649+hamza with "alef maqsura"; or rather, I think the confusion comes from Unicode's poor naming of the codepoint. But it we call it dotless yeh or the like, there can be no confusion.



<<3. Now, we are left with dotless yeh with small alef in the initial
and medial form. From previous mail, the suggestion was to use 649 +
670. Of course, visually, it is easy to tell that this is not alef maksura, but rather a dotless yeh serve as the chair for small alef. However, to develop an algorithm to search for it, it is not as easy/straight forward. I think that is why someone was sugesting to
me to use dotless ba instead of 649. Any suggestion?>>


Dotless beh is a non-starter for this purpose. It is what it is; it
is a dotless "beh". It is intended for an archaic ambigious

Agreed.

-g