[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC libquran: Data packaging

We have an automatically generated simple (It's UTF-8 but only has
small superscript alef over Cp1256) Quran text [1] used for Zekr
0.6.0beta1+. It's generated based on Meor's detailed Quran text [2],
and collated against another Cp1256 Quran text automatically.
Differences highlighted by this script [3] are verified by a group of
three people two times. The result text is here [1]:

Here is also some Hamza shaping rules considered in order to simplify
Uthman Taha text: [4]

[1] http://siahe.com/zekr/download/quran-text.txt
[2] http://arabicfonts.wikispaces.com/space/showimage/release+0.19beta.zip
[3] http://csgradpc05.cs.uwaterloo.ca/quran/code/
[4] http://csgradpc05.cs.uwaterloo.ca/quran/code/rules/index.htm

Please note that although highlighted differences generated by this
script [3] is verified, there might still be some typo in final Quran


On 7/27/07, Mohsen Saboorian <mohsens at gmail dot com> wrote:
> > Thanks and JAK, i didn't know that.
> > i'm gusseing the diffrences might be due to CP1256 not containing some
> > unicode chars e.g small alef.
> > and, yes please, i could use a sample of the differences
> Actually not all differences relate to lack of characters in Cp1256.
> For example usually ALEF_MAKSURA, in Cp1256 is written as only ي or ى
> without SMALL_SUPERSCRIPT_ALEF at the end of the word, however you can
> see that this is written as ALEF in a 5:31 in KFC. Here are some
> examples:
> مائده: ٣١ - كينگ فهد: يَا وَيلَتَا، عثمان طه: يَـٰوَيْلَتَىٰ
> كهف: 77 - كينگ فهد: لاتَّخَذْتَ، عثمان طه: لَتَّخَذْتَ
> مريم: 74 - كينگ فهد: وَرِئْيًا، عثمان طه: وَرِءْيًا
> Best,
> Mohsen.