[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Quran data and issues in encoding the Quran in unicode



Hello Meor,

Back a long time ago there was a big discussion regarding submitting a proposal to Unicode. Thomas Milo was involved as well. You can find it in the archives. I don't want to start another big discussion but just want to comment on your issues document found at http://www.pakistanopensource.org/projects/quran/files/issues-01.pdf and make suggestions on what I think is a good way to solve the situation. Everything I say here was pretty much mentioned in the emails you can find in the archives.

1) Sequential Tanween: The best and most practical way that was formulated for this is to use two fathas consecutively (with no other Unicode characters in between) for sequential fathatan, two kasras consecutively for sequential kasratan and two dammas consecutively for sequential dammatan and design your font in such a way that it will substitute the corresponding glyph whenever these character sequences are encountered.

2) Small Letters: You mention the problems in regards to the positioning of small alef, wow, and seen and whether they disconnect adjacent connecting characters. Unicode specifications may be unclear on this detail. My suggestion is to just design your font in a smart way such that it will do the necessary positioning based on the context of the small letter and that it won't disconnect adjacent characters since this is never the case in the contemporary Qur'an printings. This issue should not require any change to Unicode spec, but perhaps a request could be made to more clearly define the properties of small alef, small wow, and small seen. For instance the note "actually a vowel sign, despite the name" in the code chart for 0670 is misleading.

3) Hamza: The good old hamza... Unfortunately in Unicode the chairless hamza 0621 is defined as a character that disconnects adjacent connecting characters. And we can no longer fix this situation by re-defining 0621 because it breaks backwards compatibility with Farsi and possibly some other languages that use hamza as a disconnecting character. As we know no hamza ever breaks the connection between adjacent connecting characters in the Quran. So character 0621 as it is defined today should not be used for encoding the Quran. So the solution is to propose a new chairless hamza character that is defined not to break the connection. Until such a character is added to Unicode just use 0621 for now as if it is the new proposed chairless hamza that does not break connections.

4) Ligature lam alef wasla, lam hamza alef: This issue does not require any change being made to Unicode other than the change already proposed above in number 3. These are both font issues. The unique positioning of hamza in the lam hamza alef in Sura 2 verse 4 should be accomplished by smart font technology. Please take note that the codepoint used for the hamza in this word  bi-l-aakhirati should be the new disconnecting chairless hamza codepoint proposed above, not 0654 hamza above. For now, you can use 0621 and mass replace later when the new character is added to Unicode.

Kind regards,
Mete

---------- Original Message ----------------------------------
From: Meor Ridzuan Meor Yahaya <meor dot ridzuan at gmail dot com>
Reply-To: Meor Ridzuan Meor Yahaya <meor dot ridzuan at gmail dot com>,Development Discussions <developer at arabeyes dot org>
Date:  Thu, 16 Jun 2005 11:30:32 +0800

>First of all, I would like to inform that I've created a new Quran
>unicode data, complete with diacritics marks acording to Madinah
>Mushaf. The file have not been verified yet, so volunteers are
>welcome.
>
>Second, I remember last time there was an initiatives to submit a
>proposal to unicode. What have happen to the initiative? I've compiled
>my own issues regarding encoding and displaying the Quran in unicode.
>Appreciate comments and feedback from fellow arabeyes on the
>documents, especially those who are expert in Quranic Rasm.
>
>The above said documents can be downloaded from
>http://www.pakistanopensource.org/projects/quran/ .
>
>Regards.
>
>

--
Mete Kural
Touchtone Corporation
714-755-2810
--