[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Questions about yeh, hamzah on yeh, alef maksura and dotless ba



Meor Ridzuan Meor Yahaya wrote:
Mete,

I think your solution is lacking one thing: we can't tell where is alef maksura . Other than that, I don't have any problem. BTW, why is
it important to have normalization?



Hi again,

As I see it, normalization will make various kinds of text handling (esp. search) easier. For example, if hamza is always encoded as a distinct codepoint (i.e. never use 622/623/624/625/626) then obviously searching for hamza is easy. That's good, because the seat of the hamza has (in general) no semantic significance - it's the hamza that counts. But if you want to search for a particular seat, that's easy too - search for the seat codepoint (627/648/649) followed by hamza. To find a final dotless-yeh-qua-alef, just search for 649 followed by a word separator.


My recommendation is to convert all yehs - alef maqsuras, yeh seats
of hamza, yeh seats of small alef, regular yehs, final dotless yehs
- to Farsi yeh. Searching is no problem. Here is the algorithm:

Maybe I'm not understanding Mete, but I don't see how this could work at all. Aside from the semantics I've mentioned in another post, Farsi yeh takes dots in initial and medial forms, no? So how can it be the seat of a hamza or a small alif in those contexts?

-gregg