[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Tanween variants and Unicode
- To: General Arabization Discussion <general at arabeyes dot org>
- Subject: Re: Tanween variants and Unicode
- From: Meor Ridzuan Meor Yahaya <meor dot ridzuan at gmail dot com>
- Date: Fri, 26 Aug 2005 11:11:58 +0800
- Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=HKwyIfF8Mkjcal0eS/ha4b+WJycCfe/n4+nPGNELkJ7j+aZUNlXqRrrWG91qEEwkz4Dcjg+pzI195u5sgHIq0txTbBcBVi2HYTezK+GZQotgbejLluUjN2SCn2916Auv20+iTnDW5WWg6eQ612Z7ilKL6paFnz5pd53hf1t4b20=
So, seems like my timezone is totally different from the rest ...
Just to clarify, the use of small letters in Madinah Mushaf is
different from what is used by others. You can see at
http://www.quranpak.com/sample1.htm . The Superscript/small alef is
used to denote the "A" sound of 2 harakat, without the fatha. Take the
word "ala" , in the sample in spelled out as "ain fatha lam
superscriptalef alefmaksura", whereby in Madinah mushaf, "ain fatha
lam fatha alefmaksura superscriptalef". So, the superscript alef in
Madinah Mushaf is not really a supersciptalef, it is small alef. So,
in Madinah Mushaf, the superscript/small waw in sura 17, aya7 is
actually have the same function as other small waw at the end of word,
it is just that the "missing" waw for that word occured in the middle
of the word.
Unicode support for these character/symbol is very confusing at best.
For the small waw, they only have small waw, not superscript waw. For
the small yeh, they have both, the spacing glyph and the superscript
one. And the worst the the alef, named as superscript alef, but
described as a vowel mark, so I myself have no idea what it means, or
it what it suppose to represent. Take the SIL font for example, they
have the small waw as a non spacing glyph, contrary to Unicode
description. This is just one example how misleading the document
really is.
So, if you ask me, the best is for unicode to either change the
glyph/character property as propsed by Yousif, or add few more
codepoints for the "missing" glyph. Second approach probably can be
adopted faster.
And about the tanween/assimilated tanween, well Mete, I can tell you
that the only standard technology today that can support it is
Opentype. True, opentype support is available on Microsoft platform,
Linux, OS X, and maybe some other platform, but I think it is not
enough. First, opentype support for those platform are still
"problematic". If you look at gnome bugzilla, pango did some
workaround to make it compatible with uniscribe, but have to deviate
from the standard document. Also, tools to produce opentype font are
not widely available. The only good tool that I know is MS VOLT. Even
that, I personally thinks that MS VOLT is a buggy software, and I do
have proofs for it. The other tool is fontlab. Try creating GSUB and
GPOS table with fontlab, and I think you will go crazy. Maybe Gregg
know better.
Another important thing about technology: Pocket PC 2003 does not have
full opentype support. I'm not sure about palm, but I doubt it has.
So, we can't display the text on those platform. I think these
platform is very important for displaying the Quran, since it is the
most convenient for all. ( I really would like to get one specifically
for reading the Quran).
On side note, I've just started to understand how Visual Truetype
works (sort of). My problem was I started with arabic font, but all of
the documentation/samples/terminology are very tailored to latin font.
Yesterday I decided to use Bitstream Vera font, remove the hint, and
start hinting the font by going thru the document. I finally
understand something!! However, I'm still not sure how I can apply
those concept / method to arabic font. I'm thinking of using bitstream
font and merge it with my font to get a complete font. The license
seems to permit such modification, but not sure the implication of
doing that.
Regards.
On 8/26/05, Mete Kural <metek at touchtonecorp dot com> wrote:
> Hello Nadim,
>
> I think I didn't communicate myself efficiently. I am not proposing that we should use a <tanween+modifier> sequence for tanween with small meem and assimilated tanween just to save the hassle of proposing six extra new codepoints to Unicode (although it would truly be quite a hassle to try to propose six new codepoints). It is because using a <tanween+modifier> sequence preserves the text's graphemic integrity better and results in a cleaner encoding. A fathatan is a fathatan, regardless of whether its pronounciation changes slightly. An assimilated fathatan or a fathatan with small meem is still a fathatan, in fact it is just as much fathatan as any other fathatan. For hundreds of years all of these fathatans were written the same exact way. In more recent times scribes have decided to write these two kinds of fathatans slightly differently to cue the un-educated reciter to pronounce correctly. For that reason the logical way to encode this is the <fathatan+modifier> sequen
> ce in order to preserve the fathatan codepoint. Using a seperate codepoint will break this graphemic integrity.
>
> In Unicode Arabic there are several instances where certain codepoints break this kind of graphemic integrity. Some of these were added because that was the way it was in legacy Arabic codeblocks that were prepared a long time ago by corporations that wanted to localize their software into Arabic the cheapest and quickest way. Not much scholarly advice was sought. Your argument is that we can compromise from the graphemic integrity yet another time in order to allow legacy font technologies to render these tanween variants. My opinion is that it is better not to introduce yet another blunder into Unicode Arabic in order to support the legacy. We have different biases. Your bias is towards legacy support, my bias is towards graphemic integrity. This analysis doesn't resolve our differences but at least we can identify them better.
>
> Kind regards,
> Mete
>