[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Proposal for the Basis of a Codepoint Extension toUnicodeforthe Encoding of the Quranic Manuscripts
- To: "General Arabization Discussion" <general at arabeyes dot org>
- Subject: Re: Proposal for the Basis of a Codepoint Extension toUnicodeforthe Encoding of the Quranic Manuscripts
- From: "Thomas Milo" <t dot milo at chello dot nl>
- Date: Wed, 22 Jun 2005 15:47:02 +0200
Gregg Reynolds wrote:
> Thomas Milo wrote:
>> Gregg Reynolds wrote:
>> The qur'anic assimilation of second one is not yet supported, but it
>> will read like this:
>> khushubu- m:usannadätu-n (DMG: ḫušubu- m:usannadätu-n)
>> As you can see, initial compensatory shaddä is treated differently
>> from morphological shaddä.
> Yes; this is an example where a very useful codepoint is unlikely to
> be endorsed by unicode. We could use two shaddas, one phonotactic
> and one lexical. I think there might even be a third case but I
> can't think of it at the moment.
I was not suggesting this as a potential codepoint. I see no graphemic
difference between either use of shadda. My reversible trabscription
algorithm inserts alif-wasla before any initial consonant cluster, incluting
the [mm-] of /m:usannadätu-n/. Consequently, this connecting cluster must be
marked in a different way, so I borrowed a conventional sign that also
happened to be ASCII (another constraint). I added a comment because I knew
it would intrigue you.
What I sense from our discussions, is that your are including the
morpho-phonological level of analysis in the discussion, whereas I try to
stick to a script-oriented graphemic level. Both are abstract and very
distinct from the tendency of the Unicode group to encode conventions that
originated from within the graphic industry without any particular
discipline in analysis.
Yet the Unicode standard has the explicit ambition to encode plain text,
which I interpret as trying to encode a script in graphemic units: in
minimal distinctive functional units of a given writing system, not in
linguistic units or elements of given type case.
>> What's the objection? It would be just as transparent as you
> I have to think some more about the paired vowels idea.
>> Anyway, I like your approach. If it is to find any acceptance, there
>> needs to be canonical equivalence with legacy encoding accoding to
>> this formula:
>> TANWEEN = <vowel><small noon>
>> = conventional tanween
>> TAMWEEM = <vowel><small meem>
>> IDGHAM = <vowel><idgham
> But I wouldn't call it <small noon>; we want to retain the semantics
> of tanween explicitly in the encoding element so that software
> doesn't have to infer tanween based on two codepoints. This is the
> kind of thing I mean when I say intelligence should be migrated from
> software to the encoding as much as possible.
I just used a distinctive name in plain language. Obviously my preferred
name for this code point would be ARABIC TANWEEN MARKER, along with ARABIC
TAMWEEM MARKER and ARABIC IDGHAM MARKER
I agree that by calling it SMALL NOON it could be confused with the existing
one-off small nuun code used fir completing the word /nanjii/ (off the top
of my head).