[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Quran in Unicode format



On Friday 10 February 2006 22:45, lionelf wrote:
> Bismillah
> 
> Salam

Wa alaikum asalam wa rahmatullah.
 
> It turns out that vruwmiy is capable of encoding more information than 
> traditional Arabic script. For example, the group 'an naas' has the 
> second nuwn prolonged: we can show this in vruwmiy but there is no 
> mechanism in traditional Arabic.

see:

http://www.pheye.net/abdalla/articles/trans

With the help of tashkeel, a transliterator can double the letter if
the letter is associated with a shadda.
  
> Let's assume that we have codepoints in Unicode, in a single font, for 
> every combination of superscripts and subscripts occurring in the Quran. 
> For example, we have a code for the nuwn in 'anbiyaa', with the small 
> superscripted mim and the sukuwn on the nuwn. If we don't have such a 
> font, we can prepare a metafont: that is, a font with all these 
> codepoints, assembled from whatever fonts they occur in; the only 
> restriction is that the metafont doesn't actually occur in Unicode.

Also see the paper above. The problem that you raise, regarding 
anbiyaa adheres to a rule that you can formulate in your code.
In the article above, the class handles situations where the
next letter following the laam in al or lil silence the laam
or not. For example:

alfalaq: the f does not silence the laam.
alwaSiyya: the waw does not silence the laam.
alnaas: the noon does silence the laam. So a proper transliteration should be
annoor.

lillaahi fi khalqihi shu-oon. Laam doesn't need to be silenced.
akaan lilnaasi 'ajab. Problem: Laam needs to be silenced. Should be:
akaana linnaasi 'ajab.

In the case of anbiyaa, we can see that the noon is affected by what follows
it as well.

Here are some examples:

anbiyaa (n should be m)
anbaa-a (n should be m) (e.g., wa laqad jaa-ahum min aln/mbaa-i ma feehi mustaqar)
inbithaaq (n should be m)
anbat (n should be m) (e.g., wallahu an/mbatakum min al-ardhi nabaata)

I can see of no other letter that: if it follows the noon, changes it
to a meem except the "b." The fa in, say anfaal, doesn't change the state
of the noon (majorly). So if the rule revolves around the baa-a, then you
write a handler to change the noon to meem if the noon is followed by a
baa. Note however, that this rule should be applied only if the noon is
preceeded with a certain letter; see the word anbat in the example above.
When Allah (tt) says anbatakum, the noon's state changes. When He (tt) says
nabaata, the noon doesn't change. 

> This text can then be converted back to conventional Arabic by the 
> reverse process. To me, this means that by working in vruwmiy instead of 
> traditional Arabic, we can prepare a text without any internal font 
> changes, using the metafont.

HINT: I would focus on "characters" not fonts or metafonts. If You
get the proper character values, the font should behave properly.
If the character isn't mapped by the font, that would be another
domain. It's good to stay light and portable.

As for "the reverse process," that depends on how you wrote your
code. It could be an exhausting process depending on how you are
managing the letters and their equivalents. If you use a container,
a hashtable or a map for instance, the process could be as easy as
getting the value associated with the key. (see the file buildtransmap.cc
in the URL above).

Wishing you and your family peace and good health.

Salam,
Abdalla Alothman