[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Quranic Proposal



On جمعة 11 يونيو 2004 16:10, Thomas Milo wrote:
> Dear Mohammed,
>
> Here is a more elaborate version of my comment. The real version need to be
> a PDF. BTW, do you know whether I can attach PDF's in mail to this lists?
>

  Nadim?

>
> There is already 06E2 arabic small high meem isolated form that combines
> with preceding fatha and dhamma (with 06ED arabic small low meem isolated
> form should be considered a positional variant of 06E2 for rendering
> kasra+small meem); 

 I will try to summarize this in points.

 Point 1: The Unicode Standard already has a character for "fatha"  U+064E
               yet, it also defines "fathatan" U+064B and the same applies for
               all types of tanween.
               So EITHER this is correct and in that case we request the same
               thing for all tanween types (it would be both incomplete and
               inconsistent to define "fathatan" but not "fatha with meem" nor
               "sequential fathatan).
               OR including "fathatan" was a mistake and in that case we still
               request the same for all types of tanween because we have to
               live with that mistake.
               In short, add all types of tanween or remove all them but the
               idea of having some types of tanween and not having the
               rest is certainly not consistent and misleading.

 Point 2: The existing characters (for example U+06E2 and U+064E)
               have the property of transparency.
               The expected behavior from any unicode-compliant rendering
               engine is that whenever it finds a regular character followed
               by two transparent characters, it draws the regular character
               then the first transparent character on top of it and lastly it
               draws the second transparent character on _top_ of the first
               transparent character.
               This is good if, for example, the two transparent characters
               are U+06DB and U+06DA.
               But in our case if the two transparent characters were U+06E2
               and U+064E, they would be draw on _top_ of each other and the
               result looks really different from the proposed glyph "Arabic
               Fatha With Meem".
               The proposed glyph "Arabic Fatha With Meem" as you can see
                in the hard-copy of the Qur'an is drawn with the fatha and
                meem not on top of each other but rather overlapping in a
               special way. So it's not correct to say that the Meem is on top
               of the fatha neither is it correct to say that the meem is to
               the left of the fatha.
               You could urge that this can be done using some tricks in this
               manner:
                  * With 'font technology' by setting the meem to be a bit
                     little to the right.
                  * Then adding a hack to the rendering engine to make it
                     render the sequence fatha+meem horizontally rather
                     than the standard behavior of rendering them vertically.
               But then what if I typed a regular word with a small high meem?
               The font will return it positioned a bit little to the right
               making it noticeable that it's on top of the previous character
               not the intended one!
               And anyway these trick are bad enough to reject.
              In short, the rendering engine will not be able to render the
              glyph of the requested character using the combination
              you mentioned (the result will be a fatha and on top of it
              is a small meem which has nothing to do with the real glyph)

> the other tanween variation can be proven to be a 
> horizontally arranged repetition of the vowel signs,

  Point 1: Most of the glyphs of tanween characters are not two
               glyphs arranged horizontally, I'm giving you two examples
               for this:
                  * The regular dammatan (This one particulary looks _very_
                    different from two horizontally arranged "damma"'s) ــٌــ
                  * The sequential fathatan is two _overlapping_ lines, not
                     horizontally arranged, nor vertically arranged but
                     overlapping.

  Point 2: The standard expected behavior for rendering transparent characters
                is to render them _vertically_  and it would break everything
                else if we just request from rendering engines that they
                arrange the transparent characters horizontally.
                You could argue by saying that a special 'hack' can be added
                to do that only for the sequence fatha-fatha but then that
                breaks the regular fathatan tanween completely and makes it
                impossible to get.

  Point 3:  (and the most important)
               If I type fatha then fatha, how can the rendering engine know
               what tanween type I want (regular fatharan or sequential
               fathatan)?
               You could argue by saying that we can instead request a
               character called "a fatha to be used instead of the regular
               fatha when the user wants to type sequential fathatan not
               the regular fathatan", but this one is little funny for two
              reasons:
                * Its name (whatever it will be) should either be the previous
                  mentioned 3-lines name or another really misleading and
                 confusing name (I can think of fathaS or something).
                * It would be hard for a typist to know the difference between
                  fathaS and fatha and he/she will anyway be very prone to
                  errors in that area.
                * We would still need 3 characters for fathaS, kasraS and
                   dammaS, so it's really much cleaner/nicer/appropriate/
                   better to just add the requested characters.


> ergo there is no need 
> for new code points.

  Please see above and comment if you don't agree so I can
  back it up with more facts.

> I can send you scans of such calligraphy, I you need 
> any proof.
>

  I already have those but "from above" the issue is not only
  about rendering the glyphs of those characters but also about
  identifying them.

  I am an Arab BTW.

> The special effect of a slightly offset groep of two fatha or kasra signs
> is a typographical innovation of the 1924 Egyptian Qur'an that can easily
> be handled by font technology. Our own DecoType ACE (Arabic Calligraphic
> Engine) is already enabled to handle these effects correctly. For OpenType
> it is also a simple glyph adjustment of the substitution tables.
>

  That is what we really don't want (most vendors already do that).
  Having the font do some tricks to get around limitations in the Unicode
  Standard, that it what I'm talking about.
  If that's what we want, then we shouldn't have made this proposal in
  the first place.
  We want the Unicode Standard to overcome its limitations regarding the
  Qur'an instead of getting around that by using fancy tricks on the other
  side (the rendering engine and the font technology).

> > whereas special positioning of superscript alif
> > as well as trailing alifs falls in the domain of script
> > rendering or font technology.
>
> The present proposal correctly describes the cases with trailing alifs
> ligatures and since it proposes to add them to the block of Presentation
> Forms (FD40 - FD43), we should all agree that they fall in the domain of
> typography.

  Not sure what you mean here but if you mean that they are not needed
  then I have to completely disagree the range FD40-FD43 is used extensively.

  Examples of their "very important" uses:
    U+FB50 and U+FB51 (Alef Wasla) is used _very_ extensively in the Qur'an
    that I can say that this is the most frequently used Alef in the Qur'an.

    U+FDFA is used in all Hadith/Sunna/Tafsir books and in most islamic
    books (even in some printings of the Qur'an which contains a book on the
    margin that describes the circumstances/interpretaion of every verse)
    A typical page of those books contains about 10 of U+FDFA

    U+FDFB is also used in most islamic books.

    U+FDFD is used 113 times in the Qur'an (The basmalla of each chapter of
    the Qur'an is handled artistically by any Qur'an calligrapher, i.e.
    different from regular text).


  We proposed to add the proposed characters 9, 10, 11 and 12 there because
  there are similar (in purpose, that is tanween) two characters already there
  in the table:
     U+FD3C  (Arabic Ligature Alef With Fathatan Final Form)
     U+FD3D  (Arabic Ligature Alef With Fathatan Isolated Form)

 Those (FD40 - FD43) along with FD3C and FD3D are needed because a lot of
 characters in the Qur'an have the tanween a little to the right of the alef
 instead of directly on top of it.
 You could argue by saying that the tanween can be put before the alef to get
 that but this is not the right way because:
   + This is confusing because the tanween is associated with the Alef not
       with the previous character (i.e. needs the letters to be ordered in
       a meaningless way).
   + The tanween must be on a fixed Y-axis position (just a little down
       the top of the Alef) but if the previous character, for example, was
       a low-height character (Let's say a Beh) then the tanween would be
       placed on top of the Beh but still much less than the needed Y-axis
      position. Even worse, if the previous character has some harakat
      before the tanween, the tanween may be too high to meet the required
     Y-axis position.

> Such glyphs do not belong in the Unicode Standard, the 
> inclusion of the Presenation Forms was a political compromise never meant
> to be implemented.
>

  Regardless of the reasons or the intent of that, this range is already there
  and contains very important characters and is used extensively.

  In short, the Arabic Presentation Froms A is neseccary and is used, hence
  it is not fair to ignore/remove it just because it was a political 
  compromise never meant to be implemented.

> The special positioning of superscript alif that this proposal requests to
> be encoded as a character is in fact only visible in the metal typesetting
> produced for the King Fuaf Qur'an and the handwritten clone of this
> typeface used for the King Fahd Qur'an. 

  The fact is that most printings of the Qur'an use both forms of that
  superscript alef (at least 90% of the printings, those widely used by Arabs
  in Egypt, Saudi Arabia, Kuwait, United Arab Emirates, The occupied lands,
  and in some North African  countries as well).
  Most of the other 10% of the printings are used only for educational
  purposes and are mostly of interest to scholars.

  BTW: A typical Muslim wouldn't know/care if the current hard-copy of the
            Qur'an he/she is using the King (X) printing or not.
            Because these are only different printings not different books nor
            different versions.

  Guys, please open your hard-copy of the Qur'an and search
  for the word "الثمرات" in the first chapter "Al-Baqara" verse
  number 22 and tell Mr.Milo that the word is identical to the
  one in the proposal (Sample 8.1 in the proposal) so that
  he can be sure that all of the Islamic world are using the same
  text as in the proposal.

> The same spelling, when written in 
> older qur'ans falls in line with the rules of real Islamic calligraphy and
> does not behave in the way the present proposal considers standard.

  I'm not sure from where did you get this information but I can confirm that
  there is a great deal of confusion here.
  You cannot simply say "real Islamic calligraphy" because the Qur'an
  wasn't written at all in the beginning and when they started to collect it
  form those Muslims who know it and verifying it to make sure the text
  is correct, they wrote it using an arbitrary style that cannot be even used
  today (Arabic letters didn't even have dots at that time, so a teh and
  a beh and a theh for example looked exactly the same).

  The present proposal is indeed defining the standard printing methods widely
  used everywhere in the world (Did I say more than 90%?).
  You may need to check the most important Qur'an printing organization
  located at the Saudi Arabia, The Qur'an Complex, at qurancomplex.com.
  You can ask them about this issue and any other issue that may be of
  interest to you relating Qur'an printing.
  The Qur'an Complex prints the Qur'an and distribute it over the world.
   In Egypt, for example, if you entered a mosque you will see that most
   (if not all) the hard-copies of the Qur'an on its library was printed by
  The Qur'an Complex.
  So you can say "with great certainty" that The Qur'an Complex printing
  method is the standard.
  Also, this thing is really not an issue at all these days since most Qur'an
  printing organizations now use the same notations/styles/glyphs.
  If you are still unsure I can scan different printings of the Qur'an and
  send you:
    1- The last pages of them, they contain the meaning and interpretation
       of every the various signs/glyphs.
    1- A sample page to show you that they are IDENTICAL.

  (That of course concerns the Hafs reading, the rest of readings are
   a matter of changing the glyphs and adding/removing voice signs from
   here and there as appropriate for that reading "it's reading afterall")


> The 
> combination of fatha with superscript alif does not make the superscript
> alif a new letter, grapheme or encodable character. It just causes a
> typesetting problem that can be solved by font technology or by using
> calligraphic madda (Persian: keshideh). Lingguistically, the graphemic load
> of this superscript alif does not differ, it is just a contextual
> variation.
>

  We didn't ask for a new character "superscript alef and fatha".
  We are asking about "Superscript Alef standalone"

> Superscript alif in the contemporary Arabic standard Qur'an is used in the
> following three manners:

  Let me first explain the nature/interpretation of what unicode calls
  a superscript alef.

  Arabic has three letters "Madd Letters" each one of them is
  associated with a 'haraka',  They are:
   + Alef: associated with a fatha
   + Waw: associated with a damma
   + Yeh: associated with a kasra

 They indicate a longer pronunciation when used with a harak compared
 to using the haraka alone

 For example:
  Meem+Fatha should be read like this "Meem" + "Alef" ---> "Ma"
  Meem+Fatha+Alef should be read like this "Meem" + "Alef" + "Alef" ---> "Maa"
  Meem+Damma should be read like this "Meem" + "Waw" ---> "Mo"
  Meem+Damma+Waw should be read like this "Meem" + "Waw" + "Waw" ---> "Moo"
  Meem+Kasra should be read like this "Meem" + "Yeh" ---> "Me"
  Meem+Kasra+Yeh should be read like this "Meem" + "Yeh" + "Yeh" ---> "Mee"

 (PS: the length of each pronouncation vary depending on a number of
  factors that are out of the scope of this explanations)

 In the Qur'an, most Madd letters (along with some non-Madd letters) are
 missing from the "Rasm Othmani" and thus in the printings, small letters
 are used in the place of the missing letters.

 They are used as follows:
   Madd Letters:
     + A small Alef is used instead of Alef.  (superscript alef in unicode)
     + A small Waw is used instead of Waw.
     + A small Yeh is used instead of Yeh.
   Other Letters:
     + A small Noon is used instead of Noon.
     + A small Seen is used instead of Seen.


> 1. stand-alone superscript alif on waw occurs on 
> only eight words (all of them borrowings from Syriac), always with a fatha
> on the preceding syllable: حَیَوٰة، ربَوٰا۟، زَكَوٰة، صَلَوٰة، غَدَوٰة،
> مِـشْكَوٰة، مَنَوٰة، نجَوٰة

  The missing letter here is Alef (note the fatha before it) so it's replaced
  with a small Alef (which unicode calls 'superscript alef') but it's not
  on the waw at all neither is it on the previous letter, its place is
  between the waw and the previous letter (which is the place of the
  missing letter).
  For example the word Hayaa (Life):   حَیَوٰة
  There was an Alef here but it's replaced, so the original word is: حَیَاوة
  Clearly, it doesn't make sense at all to put the small alef on the waw.
  I can give you scans from a hard-copy of the Qur'an to show you that
  the small alef is not on the waw not on the previous letter but between
  them.

  The obvious solution to encode the word: حَیَوٰة
  is to put the small alef on a tatweel (This is how it is done in the
  hard-copy of the Qur'an) to be like that: حَیَـٰوة
  and this is the correct look of it (looks exactly as the hard-copy of the
  Qur'an)

  In this case, Unicode perfectly handles this and there are no problems here.
  

> 2. stand-alone superscript alif on unmarked 
> yaa' (yaa' witout dots, or "alif maqsuura" which according to the latest
> version of the Unicode Standard must be shaped for both non-final position
> and final position), , always with a fatha on the preceding syllable:
>
>     non-final:
>     فَسَوَّىٰهُنَّ Q2:29، مِیكَىٰلَ Q2:29،  ٱشْتَرَىـٰهُ Q2:29
>     final:
>     عَلَیٰ

  Again, no problems here.

> 3. In all other cases superscript alif is combined with fatha.
>

  So, what?
  The fact that the superscript alef is combined with fatha have nothing
  to do with the problem we are requesting a character for.

  Take sample 8.1 from the proposal, the small alef here, as you see
  is not on the Reh nor on the Teh, it's on its own
  The original word was:  ٱلثَّمَرَات
  If we remove the alef and replace it with a small alef, it should look like
  the sample but with the current situation in Unicode using the existing
  character U+0670 it's rendered completely wrong:    ٱلثَّمَرَٰت
  See, U+0670 (as a transparent character) is drawn on top of the fatha
  which in turn is drawn on top of the Reh. This is completely wrong, the
  small alef here is a separate character that should be between the
  Reh and the Teh not on top of the Reh.


> As for the comparison of non-spacing superscript alif and the proposed
> spacing superscript alif with non-spacing and spacing small  yaa' (U+06E6
> and U+06E7), this is only correct from an engineering point of view. Such
> an approach does not take into the equation the linguistic or graphemic
> load of small high yaa'.
>

  What's the definition of "the linguistic or graphemic load of small high
  yaa"?

  Could you please stick to words that all of us can understand and forget
  about theoretical/philosophical views for now?

> U+06E7 is ued to annotate a full letter yaa' when it is missing from the
> rasm, e.g.: Q2:61 ٱلنَّبِیِّــۧنَ

  And the same goes for U+0670, it's used to annotate a full letter
  'alef' when it's missing from the rasm, e.g.:  ٱلۡكِتَـٰبُ
  its original word is:  ٱلۡكِتَابُ which means "the book"

  Note here that Yeh is a Madd letter as I explained earlier
  and so is Alef and Waw so the word you provided
  should be read "Annabeen".
  Note the two 'e' letters, the small Yeh here denotes another
  Madd letter (so, it's long pronunciation as you call it).

> U+06E6 is not a contextual variation of U+06E7, but a word-final only
> trailing small yaa' 

 The fact that every Qur'an scholar knows and which is clearly explained
 in the few last pages of every hard-copy of the Qur'an is that the small
 Yeh is read using the same pronunciation length regardless of its
 position in the word (final or middle, doesn't matter, it's still the same
 length). It only depends on some other factors like whether a Madda is
 on the top of that small letter or not.

 That's the same for the three Madd letters (Alef, Waw and Yeh).
 So, it's not appropriate to say that U+06E7 and U+06E6 are different
 in pronouncation, it's just that they are needed because one is placed
 at a regular position and the other is placed high (as a transparent
 character)

> which is used to mark the long pronunciation of short 
> kasra of the pronominal suffix {-h} in cases where the preceding syllable
> has a short vowel. 
  
  All of the three Madd letters have this property.
  That's why they are called "Madd Letters".
  "Madd" in Arabic means "Lengthening", that is, the
  pronunciation is being lengthened whenever one of
  those three letters is found.

> To mark prolongation of the short damma in final
> position under similar conditions, U+06E5 small waw is used in the same
> manner. Examples: Q2:22 بِهِۦ
> Q2:16 حَوْلَهُۥ
>

  BTW: The small Waw is also used as the same way as U+06E7
           but because a small high transparent character already
           exists (damma), it's used and no need to make a redundant
           character parallel to U+06E7.

  Now that I explained, you should be able to identify the close relationship
  between Alef, Waw and Yeh

  In short:
    + Some letter are replaced by smaller versions of them.
    + Of these letters are the Madd letters (Alef, Waw and Yeh)
       and in that case they also affect the length of pronunciation
       *regardless* of their position in the word (final or no final,
        it doesn't matter the pronunciation is still lengthened because
        they are "Madd Letters")

  In that light, you should note that the proposed character is indeed
  needed and is very close to U+06E6 (both are small, both are madd
  and, finally, both are stand-alone)

  If you are still not convined then please show us how can we encode
  the sample provided using the existing character U+0670.

> I hope I can convince you 
> that 06E1 ARABIC SMALL HIGH DOTLESS HEAD OF KHAH is a redundant glyph
> variant of 0651 ARABIC SUKUN . This form of Sukun (as it is clearly called
> in the supplement of the Qur'an that you are referring to) only occurs in
> the Standard Arabic Qur'an where 06DF ARABIC SMALL HIGH ROUNDED ZERO looks
> like a normal, rounded Sukun. In other words, this is a font issue, not a
> character issue. Sukun is sukun, whether it looks like a chicken or an egg
> (i.e., whteher it looks like ra's khah or small heh).
>

  Please note that the Qur'an is not used alone in most cases.
  It's mixed with Arabic text all the time (Tafsir books as an example)
  or even quotations from the Qur'an.
  The Qur'an is a *part* of Arabic and shouldn't be considered
  a derivative of it at all.
  Thus a Qur'anic font should be used as an Arabic font as well, not only
  that but also existing Arabic fonts *should* have all the necessary
  glyphs to render the Qur'an.

  If we are to remove one of those two characters then which glyph should
  be mapped to that character? the regular sukun? the head of khah?
  If we mapped to it the regular sukun, then the font is not usable for
  Qur'an (which is a *part* of Arabic) and hence is not a proper Arabic
  font.

  If we mapped to it the head of khah, then the font is not usable for
  non-Qur'anic text and hence (again) is not a proper Arabic font.

  My point is that the Qur'an is not "a different looking Arabic" but it's
  rather "proper Arabic that should look the same as any other Arabic
  text and but has more symbols"

> > As I noted above, I don't like the idea of delegating everything to fonts
> > as
> >
>  > this is not the right thing to do.
>
>  As you can see, I fully agreee with you regarding the position of font
> technology relative to encoding; we only need to synchronise our analysis.
>

  Not sure what do you mean, you are mentioning that most of the
  things in the proposal should handled by fonts and I still can't see
  a reason for this. (Why not fixing the beast instead? That's what
  we are trying to do)

> > I can send you a "well designed" font that can display the Qur'an
> > perfectly using only ASCII characters but this is not good at all.
>
> Thank you. I don't think I can bear the sight of such fonts. I prefer real
> Qur'anic script.

 You got my point then.

>
> Thomas

  Please, see if you need any samples/proofs for any of the facts
  I mentioned (espicially those regarding Arabic grammar) so I can
  provide you with them.

  Thanks,,

-- 
Mohammed Yousif
Egypt