[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Questions about yeh, hamzah on yeh, alef maksura and dotless ba
- To: "General Arabization Discussion" <general at arabeyes dot org>
- Subject: Re: Questions about yeh, hamzah on yeh, alef maksura and dotless ba
- From: "Thomas Milo" <t dot milo at chello dot nl>
- Date: Fri, 23 Dec 2005 17:11:17 +0100
Gregg Reynolds wrote:
>> In 7th century and earlier Arabic the dotless yeh that represented
>> what is commonly referred to as alef maqsura indeed gives the
>> impression that a full-fledged character. After the analysis of age
>> of Arabic grammarians this yeh was seperately categorized which gave
>> the impression that it does not have lexical status. The fact is that
>> in 7th century and earlier Arabic when there were no superscript
>> alefs or hamzas these yehs were just yehs. At that time, you could
>> think of yeh as a multi-purpose character, sometimes it takes the
>> roll of a consonant, sometimes an 'i' vowel and sometimes an 'a'
>> vowel.
This is still the case toady in Arabic orthography that writes final yaa'
without dots.
> Yes; unfortunately we don't have good terminology in this area. In my
> view it makes sense to view written Arabic as operating on two levels;
> it's easy to name the levels (surface v. depth), but not the things
> that live at the levels. The graphic forms ("letters") at the surface
> usually are identifiable as both full-fledged identifiable graphemes
> *and* as signs of an underlying deep "character" (i.e. lexically
> significant) identity. E.g. ba' ب is a form in itself that also
> denotes a lexical category. In contrast, dottless yeh is a clearly
> identifiable surface category, but may denote multiple lexical
> categories.
Your deep level is in linguistical terms phonology. Its units are phonemes.
What you are saying is that there is no exact 1-1 relationship between
phonemes and graphemes.
> ("Lexical category", for lack of a better term. Meaning, a unit used
> to construct "words"
These units are called morphemes, part of the discipline of morphology (or
"grammar") which is a different category altogether.
> Maybe I should say "sublexical" or
> "orthographic" category instead. But the latter isn't suitable,
> since letters like alef maqsura and teh marbuta are clearly
> orthographic, but not first-class lexical units.
These two letters are examples of graphemes that connect to the morphologic
level rather than phonologic level. I call them morpho-phonologic
graphemes.
> Do you see the
> terminology problem? The only things I've come up with are ugly
> neologisms, like "orthosemic" or "ortholexic" or the like.)
We don't need to reinvent the wheel: there is a robust corpus of knowledge
in the field of grammatical concepts and terminology.
> So I agree with you in the sense that things shaped like yeh are yeh,
> but only on the surface (graphemically). I disagree if you mean they
> all have the same denotation, since it is clear in the tradition that
> sometimes the yeh form means the lexical category "yeh" and sometimes
> it means the non-lexical category "alef". And what it means is what
> counts, not what it looks like.
This is not a writing or orthography issue, but a matter that takes place on
the cross-roads of phonology and morphology. The term is morphophonological
alternation. The letter taa' marbuutah expresses this alternation, but yaa'
it simply doesn't.
> One of the fundamental flaws in Unicode is that it concentrates on
> surface orthography.
That's not a flaw, it's a design specification. The principle criterium is:
plain text encoding. The problem is: "what is plain text". This
specification is not at all purely linguistic, but much more purely
pragmatic. For starters, the Unicode initiative had to take on large
invested interest in block-headed legacy encodings, the merger of which was
often sabotaged by short-sighted nationalism, script fetishism and personal
megolomania. Moreover, much of the essential research to design a consistent
flawless world wide script encoding model has never been done. It is a
miracle that Unicode got off the ground at all.
> That's great for languages that only need
> surface orthography (like English or Chinese), but in my view it is
> not a good model for languages like Arabic that operate on two
> levels. It's a design decision: pick a surface encoding design, and
> you exclude crucial semantic information from the text; pick an
> encoding that focusses on underlying lexical categories and you only
> have to worry about mapping to graphemes. For my money the latter is
> a much better approach.
English orthography is hardly phonological or uncomplicated (if that's
indeed what you mean with surface orthography). The writing system of
English is simultaneously linked to multiple (e.g., phonological,
morphological and lexical levels) of the language structure. After all, a
word like /read/ is read differently depending on morphological load (past
vs present tense), while the pronunciation of a word like /lead/ depends on
its lexical load (ahead of the troops or between typeset lines of a
lay-out). The underlying reason is that there ther never evolved an English
alphabet constisting of graphemes covering English phonemes. Instead a
complex hybrid system of making do with incompatible Latin graphemes grew
around the English language. Arabic, Turkish or Russian are in fact far much
better of, each of them for a different historical reason.
>> <<So we have (at least) four encoding candidates:
>>
>> 1. this funny alif-in-dotless-yeh-clothing (Quranic and contemporary);
>> 2. a dotless-yeh *form* that has no meaning and is used solely as a seat
of hamza/small alef/etc. (Quranic and contemporary)
>> 3. a true yeh that sometimes loses its spots (Quranic and occasionally
contemporary);
>> 4. a true yeh that always keeps its dots (contemporary usage)>>
Nrs 1 and 2 are one and the same grapheme. (Unicode character
Yeh_Hamza-Above is a of course a non-existent ligature in this approach).
Nrs 3 and 4 flavours are of the same grapheme.
>> This is why I think the best approach would be to encode all four of
>> these cases with the same yeh codepoint.
I fully agree. 1/2 and 3/4 are all a single grapheme. They look like ducks,
the quack like a ducks, they are b-y ducks. And like the hamza, the twindots
are also to be encoded as a separate character
> Thus eliminating information from encoded text. Why would this be a
> good thing? Lexical yeh (for lack of a better term) and dotless yeh
> look alike but denote completely different things; why conflate them?
This is analogous to the /lead/ vs /lead/ problem in English. No one would
ever consider encoding /ea/ in /lead/ different from /ea/ in /lead/.
Encoding is about graphemes, not phonemes or morphemes.
>> Somehow I have thought that they came up with the name alef maqsura
>> as originally intended for the superscript alef that goes on top of
>> the yeh seat and not the seat itself; similar to alef qasiir. But
>> anyway, that was just a subconcious guess.
>
> It had never occurred to me before, probably because when I learned
> Arabic all the examples were contemporary; I don't recall the small
> alif ever being written. We just learned "alef maqsura" by rote.
In Persian tradition, the Arabic term "alef maqsura" is still used in its
original function to express the opposition with "alef Tawiila"
> . I wonder why al-Nahw
>> al-Wafy and Wright say different things here.. Maybe checking the
>> history of the naming of alef maqsura would solve the puzzle.
>
> Well, al-Nahw al-Wafy talks about maqsur *nouns*, and Wright doesn't
> (or if he does I haven't found it). But the latter mentions alef
> maqsura in the context of talking about the letters, whereas I
> haven't found any mention of it in the former (4 volumes, about 800
> pages each - I can't say it isn't in there somewhere.) So the notion
> of a maqsur with respect to nouns is clear (and can be found in
> classical works of Quranic i'rab), but the naming of the letter
> remains a minor mystery to me. Maybe it's simply graphical, and
> refers to the length of the graphically shortened alif above the yeh
> form.
This seems about right.
>> <<The question remains as to why they chose dotless yeh to carry the
>> small alif, instead of some other graphical convention.>>
>>
>> Well, actually there was no small alef to begin with as you may know.
>> There was the dotless yeh and the small alef was later put on it.
>> Maybe a more interesting question is how Arabic orthography evolved
>> in the 5th and 6th centuries such that a yeh was used for the 'a'
>> vowel sound. I thought the choice of yeh was a consequence of Arabic
>> grammar rules but my Arabic grammar is not strong enough to point to
>> what rule it would be..
>
> My latest speculation is that they just needed something graphically
> distinct to put at the end of the word. But also, it may be that they
> classified the sounds of yeh and alif together - there's a little
> essay about the letters at the beginning of my copy of Lisaan al-Arab
> that makes that sound maybe kinda possible.
I recently met Yasin Dutton (I am sure most of you know who he is). We had a
long discussion about this phenomenon of yaa' for alif. He reminded me of
the imaalä in Syria. All of a sudden the coin dropped: in South Lebanon, an
extreme imaala of the final syllable occurs: [ftaH l-baab] is pronounced
/ftaH l-biib/. However,in non-final position it returns to a lighter imaalä:
[ftaH baab l-bayt] is pronounced /ftaH bääb l-beet/. This is completely
analogous with the written image of [ALQY] القى in final position and
[ALQAHu] القاه non-final position. It's just an idea. Anyway, the jest of
the conversation was that the phenomenon of imaalä is understudied and
should be brought into the equation.
t