[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Questions about yeh, hamzah on yeh, alef maksura and dotless ba

To: General Arabization Discussion <general at arabeyes dot org>
Subject: Re: Questions about yeh, hamzah on yeh, alef maksura and dotless ba
From: Mohammed Yousif <mhdyousif at gmx dot net>
Date: Mon, 2 Jan 2006 00:33:31 +0200
User-agent: KMail/1.9
On Sunday 01 January 2006 19:26, Thomas Milo wrote:
>
> > But if you mean encoding a Theh for example as Plate + Three Dots and
> > encoding a Beh using the same codepoint Plate + One Dot, then I
> > disagree completely.
>
> Yet, this is exactly what I mean. It brings the encoding closer to how
> older texts and ancient mus-hafs in particular are encoded. 


The fact that old Masahef didn't have dots doesn't mean that it didn't
differentiate between the letters. It did indeed.
And while this wasn't clear from the text itself, people knew that this
specific "plate" is a Beh and that second "plate" is a Teh and so forth.

> I once attested 
> an instance where the "plate" (or BEH archigrapheme) is marked with both
> two stripes (=dots) above and three stripes (=dots) above. As a result, the
> phrase /fa athaabahumu l-laahu/ could also be read as /fa ataahumu l-laahu
> (since only one archigrapheme was - doubly - marked with a total of five
> stripes, two in red, three in black). Both readings turned out to be
> eattested the the /mucjam al-qira'aati l-qur'aaniyyأ¤/ published in Kuweit
> in 1986. Without separate "plate" and "dot pattern" such observations could
> not be encoded for accurate printing, exchanging between scholars, or
> searching.
>

The variation here needs to stay even after encoding using a computer.



> BTW, nobody is expected to be typing all these details. It is meant for
> scholarly of encoding Qur'an and other manuscripts. What I propose should
> happen on the level of digital text representation, as an alternative or
> supplement to conventional encoding. Whenever possible, decomposed
> characters should be treated as the equivalents of their precomposed
> counterparts. In terms of user interface, I designed a simple conversion
> mechanism from composed to decomposed encoding. From there, simple
> backspacing already suffices to remove dots.
>

In that case, I see no harm done.


> > Now I understand what you mean. You want to be able to use one Hamza
> > codepoint for both the standalone Hamza and the HamzaAbove/Below mark.
>
> Well, no. When this automatic combination was first suggested in this list,
> I considered it an intriguing idea. At second thought, I concluded that it
> is a non-starter and sent my example /stay'asuw/ above. However, when such
> an amphibious hamza (see below) would be encoded as U+0621, a combing
> mechanism could still work with (farsi/maqsura) YEH and U+0651 HAMZA ABOVE.
>
> > Well, not only Hamza possesses  this behavior but also more Arabic
> > letters.
> >
> > To give and example, the Hamza situation here is exacly like the Seen
> > situation.
> >
> > Just to be clear:
> >  Hamza U+0621                HamzaAbove U+0654
> >  Seen    U+0633                SeenAbove    U+06DC
>
> I agree - I never contested that.
>
> > Hamza can come standalone and Seen can come standalone, in that case
> > the effect is adding one more letter to the word (Hamza or Seen) and
> > the Hamza or Seen becomes a part of the spelling of the word.
> > The word ط¨ط§ط، (Beh,Alef,Hamza) for example is three letters long, the
> > Hamza is counted because it is a separate letter here that doesn't
> > affect in the different letter Alef.
> > And the word ط´ظٹط، (Sheen,Yeh,Hamza). Three letters, the Hamza is
> > a separate letter than the Yeh and has no effect on the Yeh. And the
> > word
> > is spelled using the three letters' names (Sheen,Yeh,Hamza).
>
> Absolutely right. We must maintain standalone U+0621 and superscript hamza
> U+0651.
>

You are contradicting yourself here.
Hamza, Seen and Alef/SmallAlef all possess this same behavior and all of them
must be dealt with in the same way.
The job of HamzaAbove is the same as the job of the SmallAlefAbove and the
same as the SeenAbove.
See below.


> > Also, the word ظ…ط³ظٹط·ط±ظˆظ† (Meem,Seen,Yeh,Tah,Reh,Waw,Noon) is seven
> > letters.
> > The Seen here is counted because it's a seprate letter. It doesn't
> > affect any other letter in the word.
> >
> > But Hamza and Seen can also come "attached" to other letters acting
> > as a mark. In that case the effect isn't adding one more letter to
> > the word but only affecting the way one might think about the letter
> > which they are attached to. In this case (let me call them HamzaAbove
> > and SeenAbove), their meaning can be thought of as an alert to the
> > reader that "Beware! the letter which HamzaAbove/SeenAbove wasn't
> > really that letter, it was a Hamza/Seen that has been replaced and
> > you should pronounce them using the Hamza/Seen sound not the sound of
> > the underlying letter".
>
> In other words, they count as superscript corrections. This is well-known
> by Arabic linguists. From here I am deleting the additional examples - you
> made your point.
>

"superscript corrections", okay. But to correct what?
SeenAbove corrects to Seen.
SmallYehAbove corrects to Yeh but Yeh sometimes get replaced by another
form SmallYeh because the whole letter was missing.
SmallAlefAbove corrects to Alef but Alef sometimes "specially in the Madd
context" get replaced by another equivalent form which is SmallAlef
HamzaAbove corrects to Hamza (which was "missing" too from the Mushaf, or
to be more accurate didn't exist).

I think you didn't get the whole point (did you see all the examples?)
I'm trying to say that the Seen in the example above is similar to the Hamza
and the Alef (which includes the regular one, and the SmallAlef) and the Yeh
(which includes the regular one and the SmallYeh). When, they got replaced
a mark is added for (superscript correction, if you must). That mark is has to
be a different codepoint than what it corrects to.
That is:
 - a codepoint for Hamza and a codepoint for the corrector HamzaAbove
 - a codepoint for Seen and a codepoint for the corrector SadAbove
 - a codepoint for Yeh and a codepoint for the corrector YehAbove
 - a codepoint for SmallYeh and a codepoint for the corrector YehAbove
 - a codepoint for Alef and a codepoint for the corrector AlefAbove
 - a codepoint for SmallAlef and a codepoint for the corrector AlefAbove


> > Falls under the same domain are more letters:
> >  - Yeh/SmallYeh and YehAbove (YehAbove can be seen above Alef in Warsh
> >     Mushaf, Maghribi style. Here the Alef acts as a seat for the Yeh).
> >  - Alef/SmallAlef and AlefAbove (AlefAbove can be seen above Yeh,Waw).
> >
> >
> > Anyway, I remember you proposed a workaround for using one codepoint
> > for both SmallAlef and SmallAlefAbove. You can also use the same
> > workaround for using one codepoint for both Hamza and
> > HamzaAbove/Below.
>
> Foe small alef - yes, but not for hamza: I believe we are agreed on hamza.
> You simply misunderstood my comment about stay'asuw/.
>

That's the problem. You don't see the relation between Hamza and SmallAlef.

Yes, we are agreed on Hamza. And that's why I'm trying to compare SmallAlef
with Hamza, they share common characteristics and they need to be dealt with
in the same way.

The only difference is that Hamza has only one form Hamza. While Alef and
Yeh have more forms "SmallAlef and SmallYeh" which are equivalent.

One should think of SmallAlef as being exactly another form for Alef that is
used to add the missing Alef but in the same time be distinguishable from
other Alef letters that weren't missing.

The relation between Hamza and SmallAlef is thus the same as the relation
between Hamza and Alef.

> > The workaround depends on the the fact that modern Masahef are fully
> > marked. The idea is that if Alef/SmallAlef comes after a letter, that
> > letter is certainly marked with a haraka or something. But if
> > SmallAlefAbove comes after a letter,  there will be no marks between
> > the SmallAlefAbove and that letter because the SmallAlefAbove is
> > "attached" to that letter which acts as
> > a seat and of course the marks for that letter comes after the
> > SmallAlefAbove mark.
>
> This is not a workaround, but efficient and accurate use of existing
> Unicode points. 

It might be efficient because it removes the need for one more codepoint
but it's wrong (let alone being accurate) because it encodes two different
elements with different properties and different uses and which are identified
as different by those who invented them in the first place using a single
codepoint.


> In my analysis, there is only one small alef. It is 
> attached to the previous rasm element.

I now understand why you want to use the same codepoint for both of them.

This is very wrong information. I'm guessing your analysis is based on 
observations and intuition which cannot be trusted especially if we have Rasm
science that tells us accurately what is SmallAlef and what's SmallAlefAbove.

SmallAlef is not and was never attached to the previous rasm element, this
is because it replaces the missing letter Alef which is "of course" a separate
letter not attached to the previous element.

The attachment of SmallAlef to the previous letter is meaningless.
It's _exactly_ like attaching the letter Alef to the letter Meem in the 
word ماء (Meem,Alef,Hamza) for example.

As a simple proof, the word السماوات (Alef,Lam,Seen,Meem,Alef,Waw,Alef,Teh)
can you consider the Alef that comes after the Meem as being an attachment
to the Meem? Can you identify the Alef after the Waw as being attached to the
Waw?  Of course not. The are separate letters, the Alef's doesn't affect the
Meem nor the Waw (but please note that both the Meem and the Waw here
must always be Maftooh "having a fatha" in a fully marked text).

Now , if you removed the two Alef's and added another form of it instead
SmallAlef, would you consider the SmallAlef as an attachment to the Meem
or the Waw? Again, no. As for the fatha on the Meem and Waw, this is the
_result_ of the SmallAlef being a Madd letter not the _reason_ why SmallAlef
is not attached to the previous letter.


> If a fatha is placed between the 
> rasm element and the small alef, an offset to the left occurs, if necessary
> forcing its own horizontal spacing. The moment the fatha is removed, small
> alef retakes its normal position. In this way the Osmanli and modern Arabic
> mus-hafs can be encoded with maximal compatibility. For instance:
>
> Cairo mus-haf       هَـٰذَا
> Osmanli mus-haf:     هٰذَا
>

Good catch, that is one place where a fatha doesn't precede a SmallAlef which
your workaround fails miserably.

Again, you seem to only depend on observations.
Let me first analyze the word you mentioned scientifically not only
empirically:
The word هذا  (Heh,Thal,Alef) is not a single word, it's two words.
The two words are "ها" (Heh, Alef) and "ذا" (Thal,Alef) which together make up
a useful Arabic phrase.
Notice that the first word has an _Alef_ which was removed when the two words
were tied together and replaced by a SmallAlef.
Of course, that Alef is NOT attached LOGICALLY to the Heh and so is its
replacement SmallAlef.
So the spelling of the word became (Heh,SmallAlef,Thal,Alef).

Now, logically speaking, the SmallAlef is a separate letter between the Heh
and the Thal not a mark above the Heh.

Returning to your example:

> Cairo mus-haf       هَـٰذَا
> Osmanli mus-haf:     هٰذَا

The symbol used in BOTH is SmallAlef NOT the mark SmallAlefAbove.
In the first case, it's drawn normally.
But because Arabic calligraphy is so flexible and calligraphers would
do anything to make their lines more beautiful, in the second case
the fatha is not there because it's clear that Heh is Maftooh because
the SmallAlef replaces only Alef in the Madd context, that is one thing.

The other thing is that the calligrapher took the liberty to draw the
SmallAlef where he thought it is more attractive looking in because he 
was sure that this cannot be confused with the "superscript corrector" mark
SmallAlefAbove because Heh can _NEVER_ replace an Alef and as such cannot be
a seat for the Alef. The reader would like this look and wouldn't be confused.

-------------------------------------------------------------------------------
And BTW, the word you gave هذا is a candidate word that calligraphers like
to customize and I think you specifically chose it because of that (it's
customized, and gives the feeling that SmallAlef is attached to the Heh,
not standalone which I guess you thought might support your argument).
Also, this word has the missing Alef even in modern spelling in day-to-day
use of Arabic.
As such I would like to see the word أولئك  from the same Mushaf and I bet
that the calligrapher positioned the SmallAlef differently than the word you
gave هذا. I bet that it is in its normal position to the left of the Lam as in
most Masahef. This is because the calligrapher cannot customize this word
more because the Lam is, well a bit long ;-)
You can find the word أولئك in Sura 1, Verse 160 for example.
And this proves my point.
-------------------------------------------------------------------------------

And while we are at it, we don't encode the Qur'an text itself. We cannot do
this because it can be written in different ways not only look different but
also be different in terms of letters and marks (although the meaning doesn't
change, that is the beauty of Arabic).

Instead, we encode a single Mushaf, and that specific Mushaf doesn't have to
be encoding-compatible with other Masahef.

> Incidently, the amphibious behaviour of small alef can already be found in
> Magregi mus-hafs. But spacing of small alef between letters or placing them
> on tatweels appears for the first time in the typeset Cairo Mushaf. Since
> then it occurs exclusively in fully vowelled Qur'an texts such as the
> Medina editions.
>

Again, the position where a SmallAlef is placed is irrelevant as long as it
cannot be confused with the mark SmallAlefAbove

That is to say, SmallAlefAbove and SmallAlef can take any acceptable position
based on the calligrapher preference given that the position can't confuse the
reader.

> > However, I highly reject this type of workarounds because:
> >  - they make no distinction between the concept of a Standalone
> >     letter that has nothing to do with the other letters in the word
> >     and the concept of a combining mark associated with another
> >  letter. - They depend on the text being _accurately_ fully marked
> >     which is not the case in most existing texts. And as a
> >     consequnce, it would be impossible for the reader or software to
> >     know the meaning of the given character (woudn't be able to tell
> >     the difference between a letter and a vowel mark) and as such
> >     would make searching and other text processing tasks very hard
> > and inaccurate.
>
> This is where everybody experiences the worst problems when trying to
> encode Arabic with Unicode.
>
> IMHO there is a category missing between STANDALONE letter and COMBINING
> mark. What's missing is the the category of Arabic AMPHIBIOUS characters.
> Amphi ("between") bious ("two") should be taken in the literal sense:
> hamza, small yeh, small waw, and possibly a few more miniatures, follow
> discontinous letters (reh, waw, etc) and final letters on the base line
> with their own spacing, but between two connected letters (with or without
> tatweel), they become "amphibious": they are positioned between the
> surrounding connecting letters (with lam-alef as an extreme case!), not
> above them, carring their own vowel or madda when necessary.
>

That's a rendering detail not related to encoding.
I think there are many ways to solve this problem, one of them is introducing
a Medial Shaping behavior but anyway this is offtopic.

BTW: What I meant by Standalone is that it's _logically_ not neccessarily
          visually. That is, the letter logically has no relation with the
          previous/next element.
          An by saying "attached", I meant attached _logically_. That is,
          It affects the letter it's attached to and is associated with it.

>
> Your proposed independent smal alef could be encoded separately as such an
> amphibious Arabic character. Sof far I believe you would agree. Where we
> differ is that I claim that this particular behaviour of small alef is -
> without exception - triggerd by a preceding fatha, as I described above. 

I think this is another fundamental thing that makes you want to encode
SmallAlef and SmallAlefAbove as one codepoint.
You see the difference between SmallAlef and SmallAlefAbove as a visual one
(namely, shifting position) which is wrong. In the contrary, they may be in
the same position.
The difference between them is a logical one:
 - SmallAlef is a replacment for the letter Alef, it can be used in its place.
 - SmallAlefAbove is NOT a replacment for the letter Alef, it cannot be
    used in its place. It's only a mark that "as you said" corrects the
    pronounciation of the letter to which it's attached.

 - SmallAlef DOES affect the spelling of the containing word. That is, it's
   spelled like any other letter in the word.
   Example: صلو*ت (with the * meaning SmallAlef) is spelled:
    Sad,Lam,Alef,Waw,Teh
 - SmallAlefAbove doesn't affect the spelling of a given word. That is, its
   name is not announced when spelling the word, instead the name of
   the letter to which it's attached is spelled.

> So 
> I propose to add this amphibious behaviour to the existing code point for
> small alef instead, in order to maintain full compatibility with modern
> conventional spelling (and Osmanli spelling of Arabic).
>

Compatibility is not an issue as I showed above in the word هذا.
Nothing can prevent you form encoding them the same, the difference
is in the rendering level not the encoding level (the position of the 
SmallAlef
is only shifted a little in your example of an Osmanli Mushaf)

BTW: My private Mushaf also has SmallAlef shifted to the right that it
          looks like it's attached to the previous letter too.


I don't deny that it comes after a fatha. I'm saying that the fatha is
attached to previous letter and the SmallAlef (although comes after it)
has no relation to the previous letter and is not attached to it.

Actually, the fatha-comes-before-SmallAlef is only an observation not
a rule or something. It happens to be correct because any letter before
Alef Madd must be Maftooh (having a fatha in a fully marked text) and
since SmallAlef is nevertheless another form of Alef Madd, the letter before
it must also be Maftooh.

This is not specific to Alef Madd, but to also Yeh Madd for example.
The letter before Yeh Madd must always be Maksoor (having a kasra in
a fully marked text) and since SmallYeh is nevertheless another form
of Yeh Madd, the letter before it must also be Maksoor meaning that
SmallYeh in a fully marked text has a kasra preceding it _without_
exceptions. Would you also use this workaround for it and thus
saving another codepoint?

And please note, that your workaround can be generalized. A SmallAlefAbove
_cannot_ be preceded by _any_ mark not only fatha. That is, a SmallAlefAbove
_always_ has its seat letter as the preceding character.
Also, a YehAbove always has its seat letter as the preceding character.
The same goes for Hamza, HamzaAbove always has its seat letter as the 
preceding character without exceptions.

In short, If one thinks this way and depends only on these observations
without taking into account the various properties and usages of these
entities, then these codepoints should not exist too, for the same reasons:
 - U+06E6 SmallYeh:
    YehAbove always has the seat letter directly preceding it. So one
    might as well use U+06E7 for both SmallYeh and YehAbove.
    When U+06E7 is preceded by any mark or haraka (it just happens to be 
    always a Kasra because SmallYeh is always in the Madd context), the
    Standalone SmallYeh should be triggered otherwise it's a YehAbove.
    The same goes for Yeh U+064A.

 - U+0633 Seen:
    SeenAbove always has the seat letter directly preceding it. So one
    might as well use U+06DC for both Seen and SeenAbove.
    When U+06DC is preceded by any mark or haraka, the Standalone Seen
    should be triggered otherwise it's a SeenAbove.

 - U+0621 Hamza:
    HamzaAbove always has the seat letter directly preceding it. So one
    might as well use U+0654 for both Hamza and HamzaAbove.
    When U+0654 is preceded by any mark or haraka, the Standalone Hamza
    should be triggered otherwise it's a HamzaAbove.

and even more.

So, to boil down my critics to your workaround in one question:
Why don't you encode SmallYeh and SmallYehAbove using the same codepoint?

Using your words, that would be:
> this particular behaviour of small *yeh* is -
> without exception - triggerd by a preceding *kasra*

That's, without exception all occurrences of SmallYeh in the Mushaf has a
preceding kasra. So shall we encode it using the same codepoint as the
mark SmallYehAbove? I don't think so and the same goes for SmallAlef.

> >
> > I have given up using an OpenType font in custom Qur'an application.
> > This is because I'm not forced to use OpenType fonts or any other
> > font for that matter since I in a custom app I have control over how
> > text is being drawn.
>
> I couldn't agree more - I am doing exactly the same. But I hope to feed
> back my experience to the Unicode consortium - whether they like it or not
> :-) After all, Unicode is the only way towards interchangebility and
> searchability on the internet.
>
> Let's keep on pioneering!
>

Pioneering? :-)

Well, I only do this for the benefit of the Qur'an.

-- 
Mohammed Yousif
Egypt

"قال قائل منهم إني كان لي قرين. يقول أءنك لمن المصدقين. أءذا متنا وكنا تراباً 
وعظاماً أءنا لمدينون. قال هل أنتم مطلعون. فاطلع فرءاه في سواء الجحيم. قال
تالله إن كدت لتردين. ولولا نعمة ربي لكنت من المحضرين"  (من القرءان الكريم)
Follow-Ups:
- Re: Questions about yeh, hamzah on yeh, alef maksura and dotless ba
  - From: Mohammed Yousif
- Re: Questions about yeh, hamzah on yeh, alef maksura and dotless ba
  - From: Meor Ridzuan Meor Yahaya
References:
- Re: Questions about yeh, hamzah on yeh, alef maksura and dotless ba
  - From: Mohammed Yousif
- Re: Questions about yeh, hamzah on yeh, alef maksura and dotless ba
  - From: Thomas Milo
Prev by Date: Re: Questions about yeh, hamzah on yeh, alef maksura and dotless ba
Next by Date: 2005's Arabeyes
Previous by thread: Re: Questions about yeh, hamzah on yeh, alef maksura and dotless ba
Next by thread: Re: Questions about yeh, hamzah on yeh, alef maksura and dotless ba
Index(es):
- Date
- Thread