[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Tanween variants and Unicode
- To: General Arabization Discussion <general at arabeyes dot org>
- Subject: Re: Tanween variants and Unicode
- From: Gregg Reynolds <gar at arabink dot com>
- Date: Thu, 25 Aug 2005 12:07:07 -0500
- User-agent: Mozilla Thunderbird 1.0.2 (Windows/20050317)
Mete Kural wrote:
New Protocols that are needed: ------------------------------
- Tanween ending
in meem: fathatan+superscript meem will trigger the "tamweem" symbol,
and so forth for kasratan+superscript meem and dammatan+superscript
meem. No new character code is needed, just a protocol that explains
that the combination will trigger the corresponding glyph.
I must respectfully but vehemently object. You can't just merrily
redefine the semantics of codepoints that are already well-defined.
Fathatan means fathatan; any software that does not display it correctly
is broken, by definition. Ditto for superscript meem. If the one
follows the other, they must both be displayed.
Silent/sequential tanween: fathatan+sukuun code will trigger the
silent tanween/sequential tanween glyph, and so forth for
kasratan+sukuun and dammatan+sukuun. Sukuun is a good choice for a
codepoint here since the noon sound of the tanween is in a way
silenced. No new character code is needed, just a protocol that
explains that the combination will trigger the corresponding glyph.
Same objection. What if the author *wants* a sukuun over an -atan? By
the way, what exactly is a "silent/sequential" tanween? All tanween
variants have names in Arabic that translate quite well into English;
why not use them? By my reading, there is no such thing as a "silent
tanween"; there is an assimilated tanween, but assimilation and silence
are not the same thing. "Sukun" is definitely the wrong term.
See section 1.10 of http://www.arabink.com/patacode/encoding.pdf; see
also the bottom of p. 31 / top of p. 32.
This sort of redefinition may be ok for private experimenting, but as a
proposition for standardization it's frankly a terrible idea. Even for
private experimenting it isn't a good idea; the PUA is set aside
specifically for stuff like this.
New canonical equivalences (this one is not absolutely needed for the
Madinah Mushaf): ---------------------- - Basic tanween canonical
equivalence: fatha+fatha needs to be made canonically equivalent to
fathatan, and so on for kasratan and dammatan.
Here's the problem with this: why stop there? You can use precisely the
same argument to say that two consecutive vowels within a word should be
interpreted as one vowel + vowel lengthener. E.g. kitAb spelled kitaab.
Technically speaking, the alif in kitAb in fact denotes a lengthening
of the preceding fath, just as the second vowel in -atan denotes /n/.
Now consider kitaabaa - should the final aa be an alif or a fathatan?
Plus, what does this do for searching and sorting? A search for e.g.
fathatan won't find two consecutive fathas. So if you do this sort of
thing you'll get surprised users. OTOH, nothing says an editor can't
map two consecutive punches of the fatha key to the fathatan codepoint.