[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: RFC libquran: Data packaging
- To: "Development Discussions" <developer at arabeyes dot org>
- Subject: Re: RFC libquran: Data packaging
- From: "Mohsen Saboorian" <mohsens at gmail dot com>
- Date: Fri, 27 Jul 2007 09:59:01 +0330
We have an automatically generated simple (It's UTF-8 but only has
small superscript alef over Cp1256) Quran text [1] used for Zekr
0.6.0beta1+. It's generated based on Meor's detailed Quran text [2],
and collated against another Cp1256 Quran text automatically.
Differences highlighted by this script [3] are verified by a group of
three people two times. The result text is here [1]:
Here is also some Hamza shaping rules considered in order to simplify
Uthman Taha text: [4]
[1] http://siahe.com/zekr/download/quran-text.txt
[2] http://arabicfonts.wikispaces.com/space/showimage/release+0.19beta.zip
[3] http://csgradpc05.cs.uwaterloo.ca/quran/code/
[4] http://csgradpc05.cs.uwaterloo.ca/quran/code/rules/index.htm
Please note that although highlighted differences generated by this
script [3] is verified, there might still be some typo in final Quran
text.
Regards,
Mohsen.
On 7/27/07, Mohsen Saboorian <mohsens at gmail dot com> wrote:
> > Thanks and JAK, i didn't know that.
> > i'm gusseing the diffrences might be due to CP1256 not containing some
> > unicode chars e.g small alef.
> > and, yes please, i could use a sample of the differences
>
> Actually not all differences relate to lack of characters in Cp1256.
> For example usually ALEF_MAKSURA, in Cp1256 is written as only ي or ى
> without SMALL_SUPERSCRIPT_ALEF at the end of the word, however you can
> see that this is written as ALEF in a 5:31 in KFC. Here are some
> examples:
>
> مائده: ٣١ - كينگ فهد: يَا وَيلَتَا، عثمان طه: يَـٰوَيْلَتَىٰ
> كهف: 77 - كينگ فهد: لاتَّخَذْتَ، عثمان طه: لَتَّخَذْتَ
> مريم: 74 - كينگ فهد: وَرِئْيًا، عثمان طه: وَرِءْيًا
>
> Best,
> Mohsen.