[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposal for the Basis ofaCodepointExtensiontoUnicodefortheEncoding of theQuranicManuscripts



On Wednesday 22 June 2005 11:19, Mete Kural wrote:
> Salaam Abdulhaq,
wa `alaykum assalaam Mete

>
> When you say advanced encoding and primitive encoding, are you referring to
> an Arabic language and grammar aware high level encoding, and a
> Unicode-based graphemic encoding respectively? As far as Unicode is
> concerned only script-based encoding can be done. We can strive to make
> this script-based encoding graphemically as consistent as possible. Any
> higher level Arabic grammar-aware encoding as I have suggested before
> should be done with a markup language such as XML, not at the character
> encoding level. Inventing our own Arabic language-specific character
> encoding model outside of Unicode is simply not pragmatic nor do I think it
> is theoratically better than using XML to capture Arabic
> *language*-specific semantic over Arabic *script*-specific Unicode text.
> Unicode is the unquestioned standard for character encoding that enjoys
> tremendous support from both corporate and governmental organizations
> worldwide. Straying away from Unicode is not a wise choice.
>

Good question. I feel at this stage that we are only talking about what is 
already being encoded: that is, the uthmanic text, the added and overriding 
letters, vowels, shaddas and sukoons, and the tajweed marks (small meem, 
variations in tanween scripting etc). (Please correct me someone if I've 
missed something out).

Where I think we are at cross-purposes is that I want to abstract the tajweed 
marks (known to you as small meem, variations in tanween, madd signs, 
stop/pause signs etc), make it clearer as to what they are, and code them 
with their semantic meaning and not the particular glyph that was/is being 
used in Egypt/Saudi Arabia during the last century. Because these things have 
grammar-like names such as iqlaab you sense that they don't belong in the 
encoding, but they are already there but named as glyphs (in the case of 
iqlaab, small meem). The alternative to my suggestion is to keep adding lots 
of extra glyphs for each scriptic (avoiding the use of the word scriptural 
because it has other connotations) variation found in the muslim world for 
the last 1400 years.

The extra structure above that such as aayaat, ajzaa', other qiraa'aat etc 
would be XML based. I agree with you that any morphological analysis etc 
would also be XML based.

Quick question for you: is the small meem, indicating that the nuun is 
pronounced as meem, part of the *script*, given that it does not change the 
meaning of the word in any sense whatsoever? If we encode a local english 
dialogue and include special marks to indicate local pronounciation 
idiosyncracies, would those marks qualify to be entered into Unicode?

wassalaam
abdulhaq