[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Volunteers for verifying the quran data
- To: General Arabization Discussion <general at arabeyes dot org>
- Subject: Re: Volunteers for verifying the quran data
- From: "Mete Kural" <metek at touchtonecorp dot com>
- Date: Wed, 29 Jun 2005 08:50:23 -0700
- Cc: "Bernard S. Greenberg at Basis" <bsg2004 at basistech dot com>, Tom Patterson <pattersont at summa dot com>, Zina Saadi <ZinaS at basistech dot com>
>From: Gregg Reynolds <gar at arabink dot com>
>Now, IMO a difficult design question is whether some true morphemes
>should in fact be encoded. Obvious examples: definite article, other
>particles like laa, sawfa, sa-, direct object suffixes -hu, -ha, etc.
>Unicode will never countenance something like that, but that doesn't
>mean we shouldn't. Such design decisions should be made strictly on a
>costs/benefits basis, IMO.
I'd like to restate my opinion here that such morphemic encoding is better done at the markup level. So basically encode the characters on the basis of a graphemic encoding using Unicode and then further encode the morphemes on the markup level using an appropriate XML schema. Please take a look at what OSIS (www.bibletechnologies.com) has done. They have already done a lot of this kind of morpheme-based encoding at the markup level.