[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Questions about yeh, hamzah on yeh, alef maksura and dotless ba



Meor Ridzuan Meor Yahaya wrote:
> Oibane,
> First of all, thanks for your comments and suggestion.
>
> I think it should be made clear that, my work is mainly for encoding
> the Quran. I think, for the first stage, I've accomplished my task,
> that is to encode the quran correctly based on visual appearance,
> complying as much as I can to Unicode standard. I do need to do some
> workaround where unicode support is lacking.

Meor, I don't think Unicode is lacking in code points. Everything you need
seems to be there. What's lacking is

1. protocol for use (e.g., how to encode the variant tanweens with the
existing set)
2. font technology to handle them (e.g. how to build a font that can handle
U+0621 HAMZA correctly, or position superscript alef correctly when preceded
by a fatha)

> Now, what I would like to accomplish is, actually to make the text
> more useful for other people to study the Quran. For this, searching
> is crucial. So, to get an accurate search results, the underlying text
> must be encoded correctly. This is where a good solution is still not
> there. For example, i think most Arabic users would use 64A for yeh,
> and maybe sometimes 649, by some users. What I understood from my
> research is, traditionally, the final yeh does not come with the dots.
> It was used mainly by non arabic speaker.

This is not correct. In fact, it is the Persian and Urdu speakers - and
writers!) that preserve the original use of YEH without final dots.

> Later one, it was somehow
> adopted by arabic speaker somehow. I think most arabic speakers would
> key in 64A when searching, am I right (I am not an arabic speaker,
> BTW)?. If the text was encoded as Farsi yeh or any other code, it
> would missed the word. So, this is just more of a practical problem
> that I'm trying to solve.

This is correct. Conventional Arabic keyboards do not encode Farsi Yeh.

> FYI, I can implement any solution without any problem. If I need to
> encode all occurance of yeh (dotted or dotless) with one code point, I
> can do it within few hours, together with the required font. So, I'm
> not to concern about implementation, because it can be done quite
> easily. The only thing that worries me is alef maksura, the dotless
> yeh final form which represent alef. This is because one cannot easily
> determine which one is alef maksura and which one is yeh just by
> looking at it. I think most people here are suggesthing that not to
> worry about it. Just treat alef maksura like a normal dotless yeh.

In theory I concur with this position. On the orher hand, people _are_ used
to typing alef maqsura when they thinh it is appropriate. And the problem
is, when do they think it's appropriate? Would Egyptians type /alii/
(personal name) with a maqsura to get rid of the dots? Probably yes, so
that's something Unicode did not plan for it...

How about maintaining two separate codes for yeh and amqsura, but treating
them as equivalent insearching?

t