[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Quran data and issues in encoding the Quran in unicode



Salaam Abdulhaq,

On the point of so-called "sequential tanween" I am a little bit undecided. First of all, it is perfectly clear that what we call a sequential fathatan, sequential dammatan, and sequential kasratan in fact without doubt is a fathatan, a dammatan and a kasratan respectively. As far as I know (please confirm this those who know) the 1924 Egyptian printing of the Quran was the first to use variant glyphs for fathatan, dammatan and kasratan for the cases when the noon is not pronounced. So these sequential tanweens have not been a part of Arabic until the 20th century. Regardless since most of the Qurans printed in the world today employ these sequential tanweens Unicode has to accomodate for them somehow and these sequential tanweens need to be listed in the Arabic code page. But the thing is that since these sequential tanweens are essentially no more than just tanweens, I would be inclined to encode them as regular tanweens, fathatan, dammatan, or kasratan, in order to preserve the graphemic integrity of the text. To trigger the sequential behaviour, a special codepoint could be added that would be placed right after the respective tanween codepoint. Well this would be the ideal state of things.

Although if three seperate codepoints were added for sequential fathatan, sequential dammatan and sequential kasratan, this would not be totally inconsistent with how the Arabic codepage has been evolving since to me the Unicode Arabic codepage as a whole is a hybrid of grapheme (character) based encoding and some glyph based encoding, which is ugly. So I guess adding the three seperate sequential tanween codepoints would not be inconsistent with the current ugly state of things in the Arabic codepage, but I would prefer a cleaner method such as a special codepoint that triggers sequential behaviour.

Eventually the Arabic codepage needs to evolve to at least allow a clean encoding of the Quran, although it seems like even if that happens, the uglier method of encoding will always be available to whoever chooses to anyways.

Kind regards,
Mete

---------- Original Message ----------------------------------
From: Abdulhaq Lynch <al-arabeyes at alinsyria dot fsnet dot co dot uk>
Reply-To: Development Discussions <developer at arabeyes dot org>
Date:  Mon, 20 Jun 2005 22:49:29 +0100

>I don't agree with some basic points about all this. As I understand it 
>Unicode wants to move from what has become a glyph-based coding over to a 
>semantic-based encoding, and allow the font technology to worry about the 
>glyphs. Fine.
>
>However, Thomas et al. seem to be determined to pursue the opposite in terms 
>of tanween and tajweed related marks. Tanween is semantically totally 
>different to a fatha or one fatha followed by another. It is a semantic 
>character of its own and deserves codes of its own. Iqlaab, ikhfaa, madd etc 
>are tajweed marks that govern the pronounciation of the arabic and each 
>carries a full semantic load. It should be possible to encode these 
>semantically loaded objects in any textual representation of quran. If the 
>Unicode consortium is not interested in encoding one of the most common books 
>in the world then a further code standard must be developed on top of the 
>unicode one. Hacks like placing two characters next to each other to 
>'inspire' the font renderer to display a third semantically different 
>character just don't cut it.
>
>If Unicode really is about semantics and not glyphs, then let's have that then 
>please and give us a code-point per semantic load. If instead we have to hack 
>around with glued-together glyphs to try and indicate missing meaning, then 
>we should look elsewhere.
>
>If anyone is interested (on the safe assumption that Unicode is not interested 
>in that) then perhaps we could discuss such a code extension here.
>
>Abdulhaq
>
>On Friday 17 June 2005 09:29, Thomas Milo wrote:
>> Hi Mete, Meor,
>>
>> Just a quick reaction: U+0641 TATWEEL does not represent a character (or
>> rather, grapheme) but a unit of typography (i.e., a glyph). It should never
>> have been part of the Arabic code block in the first place. If you prefer
>> the sequence like Fatha-SmalAlifAbove (in regular Unicode) to print with a
>> upporting Tatweel, consider building a substitution in your OTF.
>>
>> A second point is the use of tanween followed by SmalMeemAbove and
>> SmalMeemBelow. This is non-standard use of the small meems, plain and
>> simple. The fact that the obvious encoding with sequential single harakat
>> and single harakat+small meem is not supported correctly by Microsofts
>> Uniscribe does not justify the use of illegal encoding. It would be better
>> to report a bug to MS typography or, if you don't like to be the prisoner
>> of third party's prorietary solutions, develop your own OTF parser (what we
>> do).
>>
>> BTW, Mete did not mention Decotype's Naskh as a font that handles Qur'anic
>> Arabic, because it is not yet published. In this project we consider the
>> Uncode points SmalMeemAbove and SmalMeemBelow a mistake: they are
>> contextual variants of SmallMeem (were - a single! -kasra pulls it below
>> the script line). Therefore we treat them as identical. However, for
>> compatibility's sake we could add a few front end substitutions to convert
>> your private encoding to our (private?) encoding (which I believe you could
>> have done to bypass the MS Unicsribe constraints)
>>
>> Regards,
>>
>> t
>>
>> Mete Kural wrote:
>> > Hello Meor,
>> >
>> > Please find my suggested encodings and explanations below.
>> >
>> >> About the small alef, personally I would like to encode it using a
>> >> tatweel + superscipt alef  for medial position, and a space +
>> >> superscript alef for isolated position. The reason being is that the
>> >> sequence will work on most, if not all existing font. You might argue
>> >> that we don't need a tatweel for medial position, but without it, you
>> >> will encounter another problem under windows. The same goes for small
>> >> noon and yeh, which i thnk beter encode it with a tatweel. For small
>> >> waw, I agree with Mr Milo.
>> >
>> > First of all, I would suggest to you not to steer the project in a
>> > way to accomodate the variety of Arabic fonts that are available
>> > today which do not implement the Unicode Arabic spec adequately. More
>> > than 90% of the Arabic fonts out there ignore implementing
>> > considerable sections of the Unicode Arabic specs. Only a handful of
>> > fonts come close to rendering the Quran correctly (at least rendering
>> > what can be encoded of the Quran with the current Unicode spec,
>> > excepting the missing needed Quranic characters). Take notice that I
>> > say come close to rendering the Quran correctly. I wouldn't be
>> > surprised if there are only a handful fonts in the world today that
>> > in fact do render correctly what can be encoded of the Quran with the
>> > current Unicode spec. In fact the only four that I know are
>> > Microsoft's Arabic Typesetting, your Arabeyes.org Meor font, and
>> > SIL's Scheherazade (even these two have problems with small alef I
>> > think) and Lateef fonts
>>
>> (http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=ArabicFon
>>t s).
>>
>> > There may be other solutions that are not yet delivered to the market
>> > or I haven't heard of. So I think trying to accomodate the encoding
>> > of the text to render the small alef correctly with other fonts that
>> > aren't suitable for rendering the Quran is unnecessary. The
>> > compromise made on the consistency of the encoding is not worth it to
>> > try to accomodate these unsuitable fonts. You already have a
>> > challenge to accomodate for the Gnome and Uniscribe rendering
>> > engines; trying to accomodate for some incomplete fonts in addition
>> > to that would leave you with a not so desired encoding quality. This
>> > is why I would recommend you not to use a tatweel before the small
>> > alef.
>
>

--
Mete Kural
Touchtone Corporation
714-755-2810
--