[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposal for the Basis of a Codepoint Extension to Unicode for the Encoding of the Quranic Manuscripts



Abdulhaq Lynch wrote:
This is a working document to enable a consensus to be established regarding a private use area to extend the Unicode arabic specification in order to support encoding the quran in a clear, simple and complete way.

This document is not complete but details basic steps for moving forward.


Nice work. A few suggestions:

a. Ignore Unicode. Focus on the needs of your community. Get the theory right first and you'll be able to generate proposals for Unicode later if you think it useful.

b. Focus on semantic categories, not "characters", and don't bias "representation" towards "glyphs" or visual representations in general. For example, your proposal for "ikhfaa" is something that hadn't occurred to me. If you're only interested in producing a visual representation of text, then arguably it isn't needed. But what if you want to generate an audio representation? Or if you just want to analyze the encoded text? Then it seems to be pretty useful.

c. Your proposal rightly diverges from Unicode. So why stop with new specialized semantic categories? Fix what's broken in Unicode. For example, Unicode's idea of tanween is pretty bad, IMO. If I could design it again I would have a single tanween character to be added after the vowel signs. The compound hamza "characters" in Unicode should be decomposed too, IMO. Textual analysis would be much easier then. Then of course there's the bidi fallacy in all its ridiculous glory. There are lots of ways to better capture the semantics of Arabic text, but the Unicode bunch is unlikely to ever approve of such an approach.

d. You don't need higher-level grammars like XML. My own opinion is that primary goal of an encoding design should be to migrate intelligence out of the application and into the text, subject to the syntactic constraints of a plain text encoding. So long as you can give a clear and concise definition of a particular semantic category, it is a good candidate for encoding as plain text.

I once came across a relevant message from none other than Richard Stallman. It was on a list for gcc development, in response to a question about conformance to the ISO definition of C. RMS' response was simply that standards are merely recommendations, and that the needs of the community take precedence. Which seems very wise to me; Unicode is so riddled with problems it is bound to be superceded some day, so blindly following it even where it doesn't meet the needs of one's community seems questionnable.

keep up the good work,

-gregg