[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposal for the Basis of a Codepoint Extension to Unicode forthe Encoding of the Quranic Manuscripts



Abdulhaq Lynch wrote:
The thing is that the contemporary Qur'an printings are almost completely
render-able today with Unicode using a character-based (not glyph based)
encoding scheme, only a few mode codepoints need to be addded that's it.
The XML elements and other such high level semantics we are talking about
address what is beyond the rendering, i.e. text analysis. So the rendering
problem is almost solved, IMHO.


Hi Mete

one problem is that the rendering problem has been 'almost solved' for a long time now.

Hi,

Also, keep in mind that rendering is not the only purpose of text encoding. Machine manipulation of the text is equally important. We want to search for stuff, sort things, etc. based on the "natural" semantics of written Arabic. The most fundamental problem with Unicode is precisely that it is optimized for certain classes of language. It's a surface encoding, which works great for languages like English, which have a surface orthography. But for a language like Arabic, with a more complex relation between orthography and lexical structure, such an encoding design falls far short of what could be done. The restriction of Unicode to visual abstract semantics represents a subtle (and no doubt unintentional) bias. That's why I recommend disregarding Unicode and designing from the ground up to satisfy the needs of the Arabic-speaking community.

-gregg