[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Quranic Proposal
- To: General Arabization Discussion <general at arabeyes dot org>
- Subject: Re: Quranic Proposal
- From: Mohammed Yousif <mhdyousif at gmx dot net>
- Date: Fri, 11 Jun 2004 22:13:14 +0300
- User-agent: KMail/1.6.1
On جمعة 11 يونيو 2004 16:10, Thomas Milo wrote:
> Dear Mohammed,
>
> Here is a more elaborate version of my comment. The real version need to be
> a PDF. BTW, do you know whether I can attach PDF's in mail to this lists?
>
Nadim?
>
> There is already 06E2 arabic small high meem isolated form that combines
> with preceding fatha and dhamma (with 06ED arabic small low meem isolated
> form should be considered a positional variant of 06E2 for rendering
> kasra+small meem);
I will try to summarize this in points.
Point 1: The Unicode Standard already has a character for "fatha" U+064E
yet, it also defines "fathatan" U+064B and the same applies for
all types of tanween.
So EITHER this is correct and in that case we request the same
thing for all tanween types (it would be both incomplete and
inconsistent to define "fathatan" but not "fatha with meem" nor
"sequential fathatan).
OR including "fathatan" was a mistake and in that case we still
request the same for all types of tanween because we have to
live with that mistake.
In short, add all types of tanween or remove all them but the
idea of having some types of tanween and not having the
rest is certainly not consistent and misleading.
Point 2: The existing characters (for example U+06E2 and U+064E)
have the property of transparency.
The expected behavior from any unicode-compliant rendering
engine is that whenever it finds a regular character followed
by two transparent characters, it draws the regular character
then the first transparent character on top of it and lastly it
draws the second transparent character on _top_ of the first
transparent character.
This is good if, for example, the two transparent characters
are U+06DB and U+06DA.
But in our case if the two transparent characters were U+06E2
and U+064E, they would be draw on _top_ of each other and the
result looks really different from the proposed glyph "Arabic
Fatha With Meem".
The proposed glyph "Arabic Fatha With Meem" as you can see
in the hard-copy of the Qur'an is drawn with the fatha and
meem not on top of each other but rather overlapping in a
special way. So it's not correct to say that the Meem is on top
of the fatha neither is it correct to say that the meem is to
the left of the fatha.
You could urge that this can be done using some tricks in this
manner:
* With 'font technology' by setting the meem to be a bit
little to the right.
* Then adding a hack to the rendering engine to make it
render the sequence fatha+meem horizontally rather
than the standard behavior of rendering them vertically.
But then what if I typed a regular word with a small high meem?
The font will return it positioned a bit little to the right
making it noticeable that it's on top of the previous character
not the intended one!
And anyway these trick are bad enough to reject.
In short, the rendering engine will not be able to render the
glyph of the requested character using the combination
you mentioned (the result will be a fatha and on top of it
is a small meem which has nothing to do with the real glyph)
> the other tanween variation can be proven to be a
> horizontally arranged repetition of the vowel signs,
Point 1: Most of the glyphs of tanween characters are not two
glyphs arranged horizontally, I'm giving you two examples
for this:
* The regular dammatan (This one particulary looks _very_
different from two horizontally arranged "damma"'s) ــٌــ
* The sequential fathatan is two _overlapping_ lines, not
horizontally arranged, nor vertically arranged but
overlapping.
Point 2: The standard expected behavior for rendering transparent characters
is to render them _vertically_ and it would break everything
else if we just request from rendering engines that they
arrange the transparent characters horizontally.
You could argue by saying that a special 'hack' can be added
to do that only for the sequence fatha-fatha but then that
breaks the regular fathatan tanween completely and makes it
impossible to get.
Point 3: (and the most important)
If I type fatha then fatha, how can the rendering engine know
what tanween type I want (regular fatharan or sequential
fathatan)?
You could argue by saying that we can instead request a
character called "a fatha to be used instead of the regular
fatha when the user wants to type sequential fathatan not
the regular fathatan", but this one is little funny for two
reasons:
* Its name (whatever it will be) should either be the previous
mentioned 3-lines name or another really misleading and
confusing name (I can think of fathaS or something).
* It would be hard for a typist to know the difference between
fathaS and fatha and he/she will anyway be very prone to
errors in that area.
* We would still need 3 characters for fathaS, kasraS and
dammaS, so it's really much cleaner/nicer/appropriate/
better to just add the requested characters.
> ergo there is no need
> for new code points.
Please see above and comment if you don't agree so I can
back it up with more facts.
> I can send you scans of such calligraphy, I you need
> any proof.
>
I already have those but "from above" the issue is not only
about rendering the glyphs of those characters but also about
identifying them.
I am an Arab BTW.
> The special effect of a slightly offset groep of two fatha or kasra signs
> is a typographical innovation of the 1924 Egyptian Qur'an that can easily
> be handled by font technology. Our own DecoType ACE (Arabic Calligraphic
> Engine) is already enabled to handle these effects correctly. For OpenType
> it is also a simple glyph adjustment of the substitution tables.
>
That is what we really don't want (most vendors already do that).
Having the font do some tricks to get around limitations in the Unicode
Standard, that it what I'm talking about.
If that's what we want, then we shouldn't have made this proposal in
the first place.
We want the Unicode Standard to overcome its limitations regarding the
Qur'an instead of getting around that by using fancy tricks on the other
side (the rendering engine and the font technology).
> > whereas special positioning of superscript alif
> > as well as trailing alifs falls in the domain of script
> > rendering or font technology.
>
> The present proposal correctly describes the cases with trailing alifs
> ligatures and since it proposes to add them to the block of Presentation
> Forms (FD40 - FD43), we should all agree that they fall in the domain of
> typography.
Not sure what you mean here but if you mean that they are not needed
then I have to completely disagree the range FD40-FD43 is used extensively.
Examples of their "very important" uses:
U+FB50 and U+FB51 (Alef Wasla) is used _very_ extensively in the Qur'an
that I can say that this is the most frequently used Alef in the Qur'an.
U+FDFA is used in all Hadith/Sunna/Tafsir books and in most islamic
books (even in some printings of the Qur'an which contains a book on the
margin that describes the circumstances/interpretaion of every verse)
A typical page of those books contains about 10 of U+FDFA
U+FDFB is also used in most islamic books.
U+FDFD is used 113 times in the Qur'an (The basmalla of each chapter of
the Qur'an is handled artistically by any Qur'an calligrapher, i.e.
different from regular text).
We proposed to add the proposed characters 9, 10, 11 and 12 there because
there are similar (in purpose, that is tanween) two characters already there
in the table:
U+FD3C (Arabic Ligature Alef With Fathatan Final Form)
U+FD3D (Arabic Ligature Alef With Fathatan Isolated Form)
Those (FD40 - FD43) along with FD3C and FD3D are needed because a lot of
characters in the Qur'an have the tanween a little to the right of the alef
instead of directly on top of it.
You could argue by saying that the tanween can be put before the alef to get
that but this is not the right way because:
+ This is confusing because the tanween is associated with the Alef not
with the previous character (i.e. needs the letters to be ordered in
a meaningless way).
+ The tanween must be on a fixed Y-axis position (just a little down
the top of the Alef) but if the previous character, for example, was
a low-height character (Let's say a Beh) then the tanween would be
placed on top of the Beh but still much less than the needed Y-axis
position. Even worse, if the previous character has some harakat
before the tanween, the tanween may be too high to meet the required
Y-axis position.
> Such glyphs do not belong in the Unicode Standard, the
> inclusion of the Presenation Forms was a political compromise never meant
> to be implemented.
>
Regardless of the reasons or the intent of that, this range is already there
and contains very important characters and is used extensively.
In short, the Arabic Presentation Froms A is neseccary and is used, hence
it is not fair to ignore/remove it just because it was a political
compromise never meant to be implemented.
> The special positioning of superscript alif that this proposal requests to
> be encoded as a character is in fact only visible in the metal typesetting
> produced for the King Fuaf Qur'an and the handwritten clone of this
> typeface used for the King Fahd Qur'an.
The fact is that most printings of the Qur'an use both forms of that
superscript alef (at least 90% of the printings, those widely used by Arabs
in Egypt, Saudi Arabia, Kuwait, United Arab Emirates, The occupied lands,
and in some North African countries as well).
Most of the other 10% of the printings are used only for educational
purposes and are mostly of interest to scholars.
BTW: A typical Muslim wouldn't know/care if the current hard-copy of the
Qur'an he/she is using the King (X) printing or not.
Because these are only different printings not different books nor
different versions.
Guys, please open your hard-copy of the Qur'an and search
for the word "الثمرات" in the first chapter "Al-Baqara" verse
number 22 and tell Mr.Milo that the word is identical to the
one in the proposal (Sample 8.1 in the proposal) so that
he can be sure that all of the Islamic world are using the same
text as in the proposal.
> The same spelling, when written in
> older qur'ans falls in line with the rules of real Islamic calligraphy and
> does not behave in the way the present proposal considers standard.
I'm not sure from where did you get this information but I can confirm that
there is a great deal of confusion here.
You cannot simply say "real Islamic calligraphy" because the Qur'an
wasn't written at all in the beginning and when they started to collect it
form those Muslims who know it and verifying it to make sure the text
is correct, they wrote it using an arbitrary style that cannot be even used
today (Arabic letters didn't even have dots at that time, so a teh and
a beh and a theh for example looked exactly the same).
The present proposal is indeed defining the standard printing methods widely
used everywhere in the world (Did I say more than 90%?).
You may need to check the most important Qur'an printing organization
located at the Saudi Arabia, The Qur'an Complex, at qurancomplex.com.
You can ask them about this issue and any other issue that may be of
interest to you relating Qur'an printing.
The Qur'an Complex prints the Qur'an and distribute it over the world.
In Egypt, for example, if you entered a mosque you will see that most
(if not all) the hard-copies of the Qur'an on its library was printed by
The Qur'an Complex.
So you can say "with great certainty" that The Qur'an Complex printing
method is the standard.
Also, this thing is really not an issue at all these days since most Qur'an
printing organizations now use the same notations/styles/glyphs.
If you are still unsure I can scan different printings of the Qur'an and
send you:
1- The last pages of them, they contain the meaning and interpretation
of every the various signs/glyphs.
1- A sample page to show you that they are IDENTICAL.
(That of course concerns the Hafs reading, the rest of readings are
a matter of changing the glyphs and adding/removing voice signs from
here and there as appropriate for that reading "it's reading afterall")
> The
> combination of fatha with superscript alif does not make the superscript
> alif a new letter, grapheme or encodable character. It just causes a
> typesetting problem that can be solved by font technology or by using
> calligraphic madda (Persian: keshideh). Lingguistically, the graphemic load
> of this superscript alif does not differ, it is just a contextual
> variation.
>
We didn't ask for a new character "superscript alef and fatha".
We are asking about "Superscript Alef standalone"
> Superscript alif in the contemporary Arabic standard Qur'an is used in the
> following three manners:
Let me first explain the nature/interpretation of what unicode calls
a superscript alef.
Arabic has three letters "Madd Letters" each one of them is
associated with a 'haraka', They are:
+ Alef: associated with a fatha
+ Waw: associated with a damma
+ Yeh: associated with a kasra
They indicate a longer pronunciation when used with a harak compared
to using the haraka alone
For example:
Meem+Fatha should be read like this "Meem" + "Alef" ---> "Ma"
Meem+Fatha+Alef should be read like this "Meem" + "Alef" + "Alef" ---> "Maa"
Meem+Damma should be read like this "Meem" + "Waw" ---> "Mo"
Meem+Damma+Waw should be read like this "Meem" + "Waw" + "Waw" ---> "Moo"
Meem+Kasra should be read like this "Meem" + "Yeh" ---> "Me"
Meem+Kasra+Yeh should be read like this "Meem" + "Yeh" + "Yeh" ---> "Mee"
(PS: the length of each pronouncation vary depending on a number of
factors that are out of the scope of this explanations)
In the Qur'an, most Madd letters (along with some non-Madd letters) are
missing from the "Rasm Othmani" and thus in the printings, small letters
are used in the place of the missing letters.
They are used as follows:
Madd Letters:
+ A small Alef is used instead of Alef. (superscript alef in unicode)
+ A small Waw is used instead of Waw.
+ A small Yeh is used instead of Yeh.
Other Letters:
+ A small Noon is used instead of Noon.
+ A small Seen is used instead of Seen.
> 1. stand-alone superscript alif on waw occurs on
> only eight words (all of them borrowings from Syriac), always with a fatha
> on the preceding syllable: حَیَوٰة، ربَوٰا۟، زَكَوٰة، صَلَوٰة، غَدَوٰة،
> مِـشْكَوٰة، مَنَوٰة، نجَوٰة
The missing letter here is Alef (note the fatha before it) so it's replaced
with a small Alef (which unicode calls 'superscript alef') but it's not
on the waw at all neither is it on the previous letter, its place is
between the waw and the previous letter (which is the place of the
missing letter).
For example the word Hayaa (Life): حَیَوٰة
There was an Alef here but it's replaced, so the original word is: حَیَاوة
Clearly, it doesn't make sense at all to put the small alef on the waw.
I can give you scans from a hard-copy of the Qur'an to show you that
the small alef is not on the waw not on the previous letter but between
them.
The obvious solution to encode the word: حَیَوٰة
is to put the small alef on a tatweel (This is how it is done in the
hard-copy of the Qur'an) to be like that: حَیَـٰوة
and this is the correct look of it (looks exactly as the hard-copy of the
Qur'an)
In this case, Unicode perfectly handles this and there are no problems here.
> 2. stand-alone superscript alif on unmarked
> yaa' (yaa' witout dots, or "alif maqsuura" which according to the latest
> version of the Unicode Standard must be shaped for both non-final position
> and final position), , always with a fatha on the preceding syllable:
>
> non-final:
> فَسَوَّىٰهُنَّ Q2:29، مِیكَىٰلَ Q2:29، ٱشْتَرَىـٰهُ Q2:29
> final:
> عَلَیٰ
Again, no problems here.
> 3. In all other cases superscript alif is combined with fatha.
>
So, what?
The fact that the superscript alef is combined with fatha have nothing
to do with the problem we are requesting a character for.
Take sample 8.1 from the proposal, the small alef here, as you see
is not on the Reh nor on the Teh, it's on its own
The original word was: ٱلثَّمَرَات
If we remove the alef and replace it with a small alef, it should look like
the sample but with the current situation in Unicode using the existing
character U+0670 it's rendered completely wrong: ٱلثَّمَرَٰت
See, U+0670 (as a transparent character) is drawn on top of the fatha
which in turn is drawn on top of the Reh. This is completely wrong, the
small alef here is a separate character that should be between the
Reh and the Teh not on top of the Reh.
> As for the comparison of non-spacing superscript alif and the proposed
> spacing superscript alif with non-spacing and spacing small yaa' (U+06E6
> and U+06E7), this is only correct from an engineering point of view. Such
> an approach does not take into the equation the linguistic or graphemic
> load of small high yaa'.
>
What's the definition of "the linguistic or graphemic load of small high
yaa"?
Could you please stick to words that all of us can understand and forget
about theoretical/philosophical views for now?
> U+06E7 is ued to annotate a full letter yaa' when it is missing from the
> rasm, e.g.: Q2:61 ٱلنَّبِیِّــۧنَ
And the same goes for U+0670, it's used to annotate a full letter
'alef' when it's missing from the rasm, e.g.: ٱلۡكِتَـٰبُ
its original word is: ٱلۡكِتَابُ which means "the book"
Note here that Yeh is a Madd letter as I explained earlier
and so is Alef and Waw so the word you provided
should be read "Annabeen".
Note the two 'e' letters, the small Yeh here denotes another
Madd letter (so, it's long pronunciation as you call it).
> U+06E6 is not a contextual variation of U+06E7, but a word-final only
> trailing small yaa'
The fact that every Qur'an scholar knows and which is clearly explained
in the few last pages of every hard-copy of the Qur'an is that the small
Yeh is read using the same pronunciation length regardless of its
position in the word (final or middle, doesn't matter, it's still the same
length). It only depends on some other factors like whether a Madda is
on the top of that small letter or not.
That's the same for the three Madd letters (Alef, Waw and Yeh).
So, it's not appropriate to say that U+06E7 and U+06E6 are different
in pronouncation, it's just that they are needed because one is placed
at a regular position and the other is placed high (as a transparent
character)
> which is used to mark the long pronunciation of short
> kasra of the pronominal suffix {-h} in cases where the preceding syllable
> has a short vowel.
All of the three Madd letters have this property.
That's why they are called "Madd Letters".
"Madd" in Arabic means "Lengthening", that is, the
pronunciation is being lengthened whenever one of
those three letters is found.
> To mark prolongation of the short damma in final
> position under similar conditions, U+06E5 small waw is used in the same
> manner. Examples: Q2:22 بِهِۦ
> Q2:16 حَوْلَهُۥ
>
BTW: The small Waw is also used as the same way as U+06E7
but because a small high transparent character already
exists (damma), it's used and no need to make a redundant
character parallel to U+06E7.
Now that I explained, you should be able to identify the close relationship
between Alef, Waw and Yeh
In short:
+ Some letter are replaced by smaller versions of them.
+ Of these letters are the Madd letters (Alef, Waw and Yeh)
and in that case they also affect the length of pronunciation
*regardless* of their position in the word (final or no final,
it doesn't matter the pronunciation is still lengthened because
they are "Madd Letters")
In that light, you should note that the proposed character is indeed
needed and is very close to U+06E6 (both are small, both are madd
and, finally, both are stand-alone)
If you are still not convined then please show us how can we encode
the sample provided using the existing character U+0670.
> I hope I can convince you
> that 06E1 ARABIC SMALL HIGH DOTLESS HEAD OF KHAH is a redundant glyph
> variant of 0651 ARABIC SUKUN . This form of Sukun (as it is clearly called
> in the supplement of the Qur'an that you are referring to) only occurs in
> the Standard Arabic Qur'an where 06DF ARABIC SMALL HIGH ROUNDED ZERO looks
> like a normal, rounded Sukun. In other words, this is a font issue, not a
> character issue. Sukun is sukun, whether it looks like a chicken or an egg
> (i.e., whteher it looks like ra's khah or small heh).
>
Please note that the Qur'an is not used alone in most cases.
It's mixed with Arabic text all the time (Tafsir books as an example)
or even quotations from the Qur'an.
The Qur'an is a *part* of Arabic and shouldn't be considered
a derivative of it at all.
Thus a Qur'anic font should be used as an Arabic font as well, not only
that but also existing Arabic fonts *should* have all the necessary
glyphs to render the Qur'an.
If we are to remove one of those two characters then which glyph should
be mapped to that character? the regular sukun? the head of khah?
If we mapped to it the regular sukun, then the font is not usable for
Qur'an (which is a *part* of Arabic) and hence is not a proper Arabic
font.
If we mapped to it the head of khah, then the font is not usable for
non-Qur'anic text and hence (again) is not a proper Arabic font.
My point is that the Qur'an is not "a different looking Arabic" but it's
rather "proper Arabic that should look the same as any other Arabic
text and but has more symbols"
> > As I noted above, I don't like the idea of delegating everything to fonts
> > as
> >
> > this is not the right thing to do.
>
> As you can see, I fully agreee with you regarding the position of font
> technology relative to encoding; we only need to synchronise our analysis.
>
Not sure what do you mean, you are mentioning that most of the
things in the proposal should handled by fonts and I still can't see
a reason for this. (Why not fixing the beast instead? That's what
we are trying to do)
> > I can send you a "well designed" font that can display the Qur'an
> > perfectly using only ASCII characters but this is not good at all.
>
> Thank you. I don't think I can bear the sight of such fonts. I prefer real
> Qur'anic script.
You got my point then.
>
> Thomas
Please, see if you need any samples/proofs for any of the facts
I mentioned (espicially those regarding Arabic grammar) so I can
provide you with them.
Thanks,,
--
Mohammed Yousif
Egypt