[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Questions about yeh, hamzah on yeh, alef maksura and dotless ba



Oibane,
First of all, thanks for your comments and suggestion.

I think it should be made clear that, my work is mainly for encoding
the Quran. I think, for the first stage, I've accomplished my task,
that is to encode the quran correctly based on visual appearance,
complying as much as I can to Unicode standard. I do need to do some
workaround where unicode support is lacking.

Now, what I would like to accomplish is, actually to make the text
more usefull for other people to study the Quran. For this, searching
is crucial. So, to get an accurate search results, the underlying text
must be encoded correctly. This is where a good solution is still not
there. For example, i think most Arabic users would use 64A for yeh,
and maybe sometimes 649, by some users. What I understood from my
research is, traditionally, the final yeh does not come with the dots.
It was used mainly by non arabic speaker. Later one, it was somehow
adopted by arabic speaker somehow. I think most arabic speakers would
key in 64A when searching, am I right (I am not an arabic speaker,
BTW)?. If the text was encoded as Farsi yeh or any other code, it
would missed the word. So, this is just more of a practical problem
that I'm trying to solve.

FYI, I can implement any solution without any problem. If I need to
encode all occurance of yeh (dotted or dotless) with one code point, I
can do it within few hours, together with the required font. So, I'm
not to concern about implementation, because it can be done quite
easily. The only thing that worries me is alef maksura, the dotless
yeh final form which represent alef. This is because one cannot easily
determine which one is alef maksura and which one is yeh just by
looking at it. I think most people here are suggesthing that not to
worry about it. Just treat alef maksura like a normal dotless yeh.

Regards.

On 12/29/05, Oibane <pflm52td at w6 dot dion dot ne dot jp> wrote:
> Hello, there.
>
> Let's remember that desirable yeh is absent in Unicode now.
> Thus, it is not the problem which, i.e. U+0649 ("yeh"),
> U+064A("alif maqsura"), and U+06CC ("farsi yeh"), to choose among
> *them*, but to choose the desirable *behavior* from what they have,
> (Or attribute would be the better word? I guess you know what I mean.)
> in order to add the apropriate modification to the Unicode standard.
> In this line, Farsi Yeh-like one looked most promising. At least, it
> should be modified to lose the dots when accompanied with hamza
> over/under or small alif, right?
>
> There still remains ambiguity, though. First I admit it is partly
> due to my lack of classic Arabic knowledge. Now:
>
> 1. Thomas Milo proposed as a "less ambitious approach", where unique
> "yeh" is used throughout, and it should drop dots at the final
> position under Qur'anic locale. If this is to be adopted, I could not
> understand the following point: today's usage sometimes require both
> the dotted and un-dotted yeh at the final, so one yes does not
> suffice.  Is this solution limited to meet Meor's need?
>
> 2. On the above "farsi yeh loses points with hamza/small alif" suggestion.
> It looks natural for today's texts, where the orthography is I believe
> well-established. Now what about the situation of classic materials
> (not limited to Qur'aan)? In the relevant era, is it always the case that
> hamza or small alif are acutually written with dotless yeh, while
> ini/mid yeh which represents "y" consonant has two dots? Are there no
> occurrence of "unwritten hamza" with dotless yeh? (If I remember
> correctly, hamza is invented later by headhunting `ain.)
> Is Tom's dots codepoints necessary in this case, too?
>
> What I can suggest to Meor is to be at ease for the time being, not in
> haste. The final solution is yet to come, since there's no codepoint of
> dots nor unified true yeh in the current Unicode standard.
> If you keep your policy is clear and consistent, it is easy to filter the
> text later. (And I think yours so far is practically good. Any policy
> is OK, but yours, visual coincidence, is understood straightforward.)
>
> And to your first question: are there any clear criterion for final
> yeh to be dotted or not? I don't think you already have received a
> clear-cut answer. For cotemporary writing which allows final dotted
> yeh, no. You should understand each word. For Qur'aan, yes, your guess
> is correct.
>
> Now since I'm far from being expert, I propose another bold solution:
> (I suppose it must have been taken into account in early days of
> Unicode, but I don't know. I know it can never be merged into Unicode.)
> Totality of the encoding elements be ini/mid/fin/isol forms of letters.
> They form true graphemes.
> I dare not call them representation forms in this definition.
> Today, "letters" are considered to be elemental, and shaped forms are
> used behind user interface. What I propose is to consider letters to
> be virtual, or transparent. They get bound to keys, and texts are
> encoded with actual shape elements. If bare letters were included,
> it's illegal.
> It then forces yeh-hamza to be dotless, since there's no such thing
> "dotted-yeh-hamza". At key binding level, overwrap of dotless yeh +
> hamza and dotted yeh + hamza is allowed. They are encoded equally.
> Joiner and non-joiner are also virtual and merely key binding
> candidates, etc etc...
>
>
> By the way, does anyone know this? I know "alif saghiirah" means the
> small alif. Are there direct translation of "dagger alif" in Arabic?
> I once guessed that since it resembles to dagger sign which indicates
> footnote, the word "dagger alif" was coined by a european orientalist...
>
> Thank you all. Good day.
>
> "Oibane"
> pflm52td at wsitta.dion.ne.jp
> # sitta is 6
>
>
> _______________________________________________
> General mailing list
> General at arabeyes dot org
> http://lists.arabeyes.org/mailman/listinfo/general
>