[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Questions about yeh, hamzah on yeh, alef maksura and dotless ba
- To: "General Arabization Discussion" <general at arabeyes dot org>
- Subject: Re: Questions about yeh, hamzah on yeh, alef maksura and dotless ba
- From: "Thomas Milo" <t dot milo at chello dot nl>
- Date: Sat, 31 Dec 2005 11:53:22 +0100
Oibane wrote:
>> 1. Thomas Milo proposed as a "less ambitious approach", where unique
>> "yeh" is used throughout, and it should drop dots at the final
>> position under Qur'anic locale. If this is to be adopted, I could not
>> understand the following point: today's usage sometimes require both
>> the dotted and un-dotted yeh at the final, so one yes does not
>> suffice. Is this solution limited to meet Meor's need?
If the the "less ambitious approach" maintains the difference between the
unicode points YEH and ALEF MAQSURA, then the Qur'anic locale only
conditions the behaviour of one of them : YEH becomes a virtual FARSI YEH.
This solution will yiled the correct visual effect, although the use of
different codes for the same visual letter, i.e., for the SAME LETTER, is
IMHO corrupt.
>> 2. On the above "farsi yeh loses points with hamza/small alif"
>> suggestion. It looks natural for today's texts, where the
>> orthography is I believe well-established. Now what about the
>> situation of classic materials (not limited to Qur'aan)? In the
>> relevant era, is it always the case that hamza or small alif are
>> actually written with dotless yeh, while
>> ini/mid yeh which represents "y" consonant has two dots? Are there no
>> occurrence of "unwritten hamza" with dotless yeh? (If I remember
>> correctly, hamza is invented later by headhunting `ain.)
>> Is Tom's dots codepoints necessary in this case, too?
Superscript alef in yeh is, to my knowledge, restricted to Qur'anic Arabic.
As for YEH-HAMZA or YEH-DOTS, in early Arabic they are completely
interchangeable. The oldest texts have no marking at all, or dots (written
als small stripes).
>> What I can suggest to Meor is to be at ease for the time being, not
>> in haste. The final solution is yet to come, since there's no
>> codepoint of dots nor unified true yeh in the current Unicode
>> standard.
I fully agree.
>> If you keep your policy is clear and consistent, it is easy to
>> filter the text later. (And I think yours so far is practically
>> good. Any policy
>> is OK, but yours, visual coincidence, is understood straightforward.)
Correct.
>> And to your first question: are there any clear criterion for final
>> yeh to be dotted or not? I don't think you already have received a
>> clear-cut answer. For contemporary writing which allows final dotted
>> yeh, no. You should understand each word. For Qur'aan, yes, your
>> guess is correct.
Dots on yaa' wore traditionally ornamental, not distinctive. This tradition
survives to this day in Qur'an orthography.
>> Now since I'm far from being expert, I propose another bold solution:
>> (I suppose it must have been taken into account in early days of
>> Unicode, but I don't know. I know it can never be merged into
>> Unicode.) Totality of the encoding elements be ini/mid/fin/isol
>> forms of letters. They form true graphemes.
Allow me to correct this. What one can see are allographs. Their collective
abstraction is the Grapheme (analogous to phonemes -which one doesn't hear
but perceive - and allophones, the actual objective sounds). You propose to
encode the allographs. This not trivial at all, because in traditional
script (IMHO mandatory for Qur'anic use) , whether typeset or handwritten,
graphemes have far more allographs than on a simple typewriter.
>> I dare not call them representation forms in this definition.
Representation forms is exactly what they are.
>> Today, "letters" are considered to be elemental, and shaped forms are
>> used behind user interface.
In other words, graphemes are encoded, allographs are designed and
programmed into fonts.
> What I propose is to consider letters to
>> be virtual, or transparent. They get bound to keys, and texts are
>> encoded with actual shape elements. If bare letters were included,
>> it's illegal.
This apporach would cause differently shaped Qur'ans (calligraphic vs
simplified, with or without ligatures, complete or limited ligatures, etc.,
etc.) each to become encoded differently.
>> It then forces yeh-hamza to be dotless, since there's no such thing
>> "dotted-yeh-hamza". At key binding level, overwrap of dotless yeh +
>> hamza and dotted yeh + hamza is allowed. They are encoded equally.
>> Joiner and non-joiner are also virtual and merely key binding
>> candidates, etc etc...
I like your "out-of-the-box" approach. Obviously something has got to be be
done about the Unicode inconsistencies.
>> By the way, does anyone know this? I know "alif saghiirah" means the
>> small alif. Are there direct translation of "dagger alif" in Arabic?
Alef qusayra (or rather, maqsura!)
>> I once guessed that since it resembles to dagger sign which indicates
>> footnote, the word "dagger alif" was coined by a european
>> orientalist...
Et alors?
:-)