[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fwd: Re: teh marbuta



Thought this discussion on Unicode's "unicode" list would be of interest here.




	
		
__________________________________ 
Celebrate Yahoo!'s 10th Birthday! 
Yahoo! Netrospective: 100 Moments of the Web 
http://birthday.yahoo.com/netrospective/
--- Begin Message --- Tom Emerson wrote:
Gregg Reynolds writes:

I see in the new version that teh marbuta (0629) is still listed as right-joining. This is arguably incorrect with respect to written Arabic (or at least not such good design); it would be more usefully construed as a dual joiner. I don't know how it plays in other languages that use the Arabic script.


I'm interested in your arguments that teh marbuta isn't right joining:
how is this not correct? It can only appear word-final, which
precludes its use as a dual joiner.


Not quite; depends on what you mean by "word". I'll give you a simple example to illustrate; a more detailed explanation would involve an explanation of how spoken Arabic works and how it is represented in written Arabic, which I'd be happy to provide if you're interested, but for now let's stick with an example.


The word "risala#" (رسالة) means roughly "letter, message". (I use # as teh marbuta.) Pronounced in isolation, the word ends in a soft 'h' sound - which is why the teh marbuta (in this form) looks like a 'heh' (ه). Suffix the word with a personal pronoun (indicating possesion) and you get "risalat*kum" (رسالتكم) (I use * to mean any short vowel). The pronunciation is /t/, just like the teh (ت). But the identity of the "character" has not changed; it is still teh marbuta, which means "bound or joined teh". Note that the iso/fin form combines the shape of the heh and the two dots of the teh. Note also that teh marbuta is not traditionally considered a first-class letter in the abjadia; instead is is a clever solution to the problem that a single character (in the deep orthography, if that's the right term) takes two completely different pronunciations depending on context. I suppose the linguists have a word for this sort of thing; to me it looks like teh marbuta makes explicit a feature of deep orthography, or morphology, or in any case it's semiotics (can you tell I'm grasping here?) differ from those of the "normal" letters. This is in Arabic; I dunno about Persian, etc.

In other words, it would be useful to encode the *character* teh marbuta, as understood in Arabic tradition. So e.g a search for risala# should match risalat*kum, and when the -kum is deleted in an editor the software knows the shape of the # should revert to the heh-like shape.

I suspect this might entail disunifying teh marbuta as used in writing the Arabic language from the 'heh-with-two-dots' used in other languages that use the Arabic script.

Hope that helps.

gregg


--- End Message ---