[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposal for the Basis of a Codepoint Extension to Unicode fortheEncoding of the Quranic Manuscripts



---------- Original Message ----------------------------------
From: Abdulhaq Lynch <al-arabeyes at alinsyria dot fsnet dot co dot uk>
>I don't see why we should battle with an encoding that was invented when there 
>was no clear seperation between semantic characters and glyphs, and by 
>someone who didn't even under those circumstances think the whole thing 
>through. I also suspect that they did not understand the science of tajweed 
>but simply had a look at a couple of mashafs and made certain incorrect 
>assumptions about the glyphs they saw.
>
>Adding some new codepoints has the great benefit of totally seperating tajweed 
>marks (which are nothing to do with grammar by the way) from the actual text, 
>making searching trivial. It allows the rendering application to apply 
>whatever local rules apply for that rule ( a meem here, a circle there, two 
>staggered or horizontal fathas etc).

When I referred to the the tajweed rules referred as aspects of grammar I meant so that they are aspects of grammar as much as i`raab are aspects of grammar. Basically with the Unicode Arabic block it is almost possible to encode the Quran cleanly from a graphemic perspective. But the kind of higher level encoding scheme you are referring to gets too much involved in the specifics of the Arabic language rather than the Arabic script, it goes into language encoding rather than script encoding. Since Unicode is intended for encoding scripts not languages this kind of high level encoding would be outside the scope of Unicode anyways. But nonetheless private use area could be used for such a project. Please see:
http://www.unicode.org/standard/supported.html
"The Unicode Character Standard primarily encodes scripts rather than languages. That is, where more than one language shares a set of symbols that have a historically related derivation, the union of the set of symbols of each such language is unified into a single collection identified as a single script."
Also http://www.unicode.org/faq/basic_q.html#17

>I agree about the XML too, in fact that was my first thought, but the other 
>great benefit of the new codepoints is that the text stream can be passed 
>directly to an OpenType renderer without processing XML.

I really think XML is the way to go for language encoding. Script encoded text would further be encoded with higher level language-specific XML tags in order to give the text language-specific semantic emphasis. It seems like this is where the text industry is going towards. XML over Unicode. Unicode for script level encoding. XML for higher level semantic. So I would be biased towards the XML option.

Kind regards,
Mete

--
Mete Kural
Touchtone Corporation
714-755-2810
--