[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Quran data and issues in encoding the Quran in unicode



Hello again Meor,

Another addition I have to make is in regards to ya-aadamu in 2:33. The hamza in ya-aadamu of 2:33 (and the 3-4 other instances ya-aadamu found in the Quran) should be 0621 (for now until the new chairless hamza is added to Unicode), not {tatweel 0640} + {hamza above 0654}. Because this is the same exact aadam as found two verses above in 2:31 where it says "wa allama aadama al-asmaa..". In 2:31 when aadam is written, 0621 is used for the hamza of aadam. Whereas in 2:33 aadam is simply prefixed with ya-. This should not cause the hamza of aadam to change from 0621 to 0654 on top of 0640 tatweel. The hamza should still be 0621, otherwise we are breaking the graphemic integrity of the text and we don't want to do that. In fact the next verse 2:34 has "li-aadama" and it is still the same 0621 hamza that you have used there. If aadamu by itself uses 0621, li-addama uses 0621 then ya-aadamu should use 0621 too.

Kind regards,
Mete

---------- Original Message ----------------------------------
From: "Mete Kural" <metek at touchtonecorp dot com>
Reply-To: metek at touchtonecorp dot com,Development Discussions <developer at arabeyes dot org>
Date:  Thu, 16 Jun 2005 09:01:31 -0700

>Hello Meor,
>
>Back a long time ago there was a big discussion regarding submitting a proposal to Unicode. Thomas Milo was involved as well. You can find it in the archives. I don't want to start another big discussion but just want to comment on your issues document found at http://www.pakistanopensource.org/projects/quran/files/issues-01.pdf and make suggestions on what I think is a good way to solve the situation. Everything I say here was pretty much mentioned in the emails you can find in the archives.
>
>1) Sequential Tanween: The best and most practical way that was formulated for this is to use two fathas consecutively (with no other Unicode characters in between) for sequential fathatan, two kasras consecutively for sequential kasratan and two dammas consecutively for sequential dammatan and design your font in such a way that it will substitute the corresponding glyph whenever these character sequences are encountered.
>
>2) Small Letters: You mention the problems in regards to the positioning of small alef, wow, and seen and whether they disconnect adjacent connecting characters. Unicode specifications may be unclear on this detail. My suggestion is to just design your font in a smart way such that it will do the necessary positioning based on the context of the small letter and that it won't disconnect adjacent characters since this is never the case in the contemporary Qur'an printings. This issue should not require any change to Unicode spec, but perhaps a request could be made to more clearly define the properties of small alef, small wow, and small seen. For instance the note "actually a vowel sign, despite the name" in the code chart for 0670 is misleading.
>
>3) Hamza: The good old hamza... Unfortunately in Unicode the chairless hamza 0621 is defined as a character that disconnects adjacent connecting characters. And we can no longer fix this situation by re-defining 0621 because it breaks backwards compatibility with Farsi and possibly some other languages that use hamza as a disconnecting character. As we know no hamza ever breaks the connection between adjacent connecting characters in the Quran. So character 0621 as it is defined today should not be used for encoding the Quran. So the solution is to propose a new chairless hamza character that is defined not to break the connection. Until such a character is added to Unicode just use 0621 for now as if it is the new proposed chairless hamza that does not break connections.
>
>4) Ligature lam alef wasla, lam hamza alef: This issue does not require any change being made to Unicode other than the change already proposed above in number 3. These are both font issues. The unique positioning of hamza in the lam hamza alef in Sura 2 verse 4 should be accomplished by smart font technology. Please take note that the codepoint used for the hamza in this word  bi-l-aakhirati should be the new disconnecting chairless hamza codepoint proposed above, not 0654 hamza above. For now, you can use 0621 and mass replace later when the new character is added to Unicode.
>
>Kind regards,
>Mete
>
>---------- Original Message ----------------------------------
>From: Meor Ridzuan Meor Yahaya <meor dot ridzuan at gmail dot com>
>Reply-To: Meor Ridzuan Meor Yahaya <meor dot ridzuan at gmail dot com>,Development Discussions <developer at arabeyes dot org>
>Date:  Thu, 16 Jun 2005 11:30:32 +0800
>
>>First of all, I would like to inform that I've created a new Quran
>>unicode data, complete with diacritics marks acording to Madinah
>>Mushaf. The file have not been verified yet, so volunteers are
>>welcome.
>>
>>Second, I remember last time there was an initiatives to submit a
>>proposal to unicode. What have happen to the initiative? I've compiled
>>my own issues regarding encoding and displaying the Quran in unicode.
>>Appreciate comments and feedback from fellow arabeyes on the
>>documents, especially those who are expert in Quranic Rasm.
>>
>>The above said documents can be downloaded from
>>http://www.pakistanopensource.org/projects/quran/ .
>>
>>Regards.
>>
>>
>
>--
>Mete Kural
>Touchtone Corporation
>714-755-2810
>--
>
>

--
Mete Kural
Touchtone Corporation
714-755-2810
--