Hello Gregg,
---------- Original Message ----------------------------------
From: Gregg Reynolds <gar at arabink dot com>
d. You don't need higher-level grammars like XML. My own opinion is
that primary goal of an encoding design should be to migrate
intelligence out of the application and into the text, subject to the
syntactic constraints of a plain text encoding. So long as you can give
a clear and concise definition of a particular semantic category, it is
a good candidate for encoding as plain text.
Well I think Unicode is useful as it is. Unicode is encoding the Arabic script rather than the Arabic language so IMHO the kinds of things we are asking of here fall into a higher level grammar. The problem that we are currently trying to address in regards to the Quran has already been addessed or being addressed for the Bible by OSIS (http://www.bibletechnologies.net/). In the OSIS XML specification, morphemes and other units of a word can be encoded using specific XML elements. Unicode is still the encoding model for the script level encoding and XML overlays to encode higher level semantic of the text. So basically we're talking about XML over Unicode. I think that is the way to go. Inventing our own character encoding model outside of Unicode is not going to receive support except from a handful of people. I believe in utilizing already available standards, especially when they "can" address the needs of the community. With XML over Unicode what we are discussing here