[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Tanween variants and Unicode



Hello Nadim,

I think I didn't communicate myself efficiently. I am not proposing that we should use a <tanween+modifier> sequence for tanween with small meem and assimilated tanween just to save the hassle of proposing six extra new codepoints to Unicode (although it would truly be quite a hassle to try to propose six new codepoints). It is because using a <tanween+modifier> sequence preserves the text's graphemic integrity better and results in a cleaner encoding. A fathatan is a fathatan, regardless of whether its pronounciation changes slightly. An assimilated fathatan or a fathatan with small meem is still a fathatan, in fact it is just as much fathatan as any other fathatan. For hundreds of years all of these fathatans were written the same exact way. In more recent times scribes have decided to write these two kinds of fathatans slightly differently to cue the un-educated reciter to pronounce correctly. For that reason the logical way to encode this is the <fathatan+modifier> sequence in order to preserve the fathatan codepoint. Using a seperate codepoint will break this graphemic integrity.

In Unicode Arabic there are several instances where certain codepoints break this kind of graphemic integrity. Some of these were added because that was the way it was in legacy Arabic codeblocks that were prepared a long time ago by corporations that wanted to localize their software into Arabic the cheapest and quickest way. Not much scholarly advice was sought. Your argument is that we can compromise from the graphemic integrity yet another time in order to allow legacy font technologies to render these tanween variants. My opinion is that it is better not to introduce yet another blunder into Unicode Arabic in order to support the legacy. We have different biases. Your bias is towards legacy support, my bias is towards graphemic integrity. This analysis doesn't resolve our differences but at least we can identify them better.

Kind regards,
Mete

---------- Original Message ----------------------------------
From: Nadim Shaikli <shaikli at yahoo dot com>
Reply-To: General Arabization Discussion <general at arabeyes dot org>
Date:  Thu, 25 Aug 2005 15:12:21 -0700 (PDT)

>--- Mete Kural <metek at touchtonecorp dot com> wrote:
>> Looks like this company [quranpak.com] is doing what many others such
>> as Harf, etc are doing; using their own non-standard encoding scheme.
>> It might be partially based on Unicode but it's surely not Unicode
>> since Unicode yet does not support all the features necessary for
>> Quran printing. They've done a good job mashallah.
>
>This is partially why I'm saying let's give the various missing
>characters/glyphs their own entries in the character code tables.
>
>What happens now is that various vendors want to encode a character
>say the assimilated tanween (I hope Gregg is happy :-) and simply
>end-up randomly picking a non-used location which doesn't necessarily
>equate to what another vendor is using.  I'm not talking about display,
>I'm simply noting that it would be best to leave it to the end-user
>to pick and choose what characters/glyphs he/she would like to utilize.
>
>> The argument that older font technologies are incapable of rendering the
>> sequence correctly is not something that interests me personally. To give an
>
>It might not interest you yet you should not impede others from being
>innovative in case they want to solve this problem in a different manner.
>The argument that a character should not exist due to the fact that there
>are other means (notably advanced font technology) to get the job done is
>not something everyone would buy into.  Unicode is filled with examples
>that would argue against this stance and saying "they made a mistake and
>we can't correct it now due to legacy" is a cop-out.  Simply put we need
>to add 5-6 new characters and leave it be.  At that point everyone will
>be happy - the people into font technology can proceed to do what they'd
>like and those using older/different methods can have a unified/standard
>means to denote data.
>
>> Modern Qur'anic orthography is similarly complex compared to ordinary
>> Arabic text because of the many marks that are added to the text.
>> You won't get away from rendering this kind of orthography without
>> modern font technology anyways. This technology is currently available
>> on Windows, Mac, Linux, OpenBSD, you name it. Why do we need to make
>> sure that Madinah Mushaf's Qur'anic orthography renders with legacy
>> font technologies?
>
>That's upto me (and all developers and users) to decide.  Unicode is not
>a rendering specification and it should NOT dictate how I am to proceed
>to do what I'd like to do - as such, I simply need the various characters
>to be given their own code-point (within 0600-06FF or FE70-FEFF) and I'd
>happy disappear.  You might have a very particular means to come up with
>the results and you might be very justified in your current thinking but
>why exclude others in pursuing other options either in addressing this
>using older technologies and/or pursuing future alternatives.  The characters
>(or glyphs - depending on how you name them) exist and they need to be
>accounted for.
>
>Salam.
>
> - Nadim
>
>
>__________________________________________________
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam protection around 
>http://mail.yahoo.com 
>
>

--
Mete Kural
Touchtone Corporation
714-755-2810
--