[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Volunteers for verifying the quran data

>From: Gregg Reynolds <gar at arabink dot com>
>Now, IMO a difficult design question is whether some true morphemes 
>should in fact be encoded.  Obvious examples: definite article, other 
>particles like laa, sawfa, sa-, direct object suffixes -hu, -ha, etc. 
>Unicode will never countenance something like that, but that doesn't 
>mean we shouldn't.  Such design decisions should be made strictly on a 
>costs/benefits basis, IMO.

I'd like to restate my opinion here that such morphemic encoding is better done at the markup level. So basically encode the characters on the basis of a graphemic encoding using Unicode and then further encode the morphemes on the markup level using an appropriate XML schema. Please take a look at what OSIS (www.bibletechnologies.com) has done. They have already done a lot of this kind of morpheme-based encoding at the markup level.


Mete Kural
Touchtone Corporation