[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Volunteers for verifying the quran data

To: General Arabization Discussion <general at arabeyes dot org>
Subject: Re: Volunteers for verifying the quran data
From: Meor Ridzuan Meor Yahaya <meor dot ridzuan at gmail dot com>
Date: Mon, 11 Jul 2005 09:15:38 +0800
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=RaL9KnpCPNhB5p0I+elxvO1rNgLWgHfQ68fls7Ri0afAUeh0/412pQq6AQlamcjWN1ut6550D0aiLl5BEMwA84+sl5tJLzKw/ng9z0FJSMJU3VckRJ/i9mM5IcDQxZjzVSgRRCQ5M91huKahwaaVQK/CssDE7sM4ml0mNN0BZcg=

Let me explain some of the questions/queries 
First, Nadim's questions. As of now, no one have gave me any feedback
whatsoever, except for Mete, which he said that he had done some
character counting for some sura's. This could mean one of 2: someone
is working on it and not find any problem, thus not reporting any, or
2, no one is doing any proofread. It seems to me it is no 2. As for
PDF, the problem is the communication between me and the site
provider, Mr Thariq. He's in Pakistan, and I'm in Malaysia. As we
know, internet link to Pakistan still have problems, thus I can't
update the site. (I don't have admin access yet) I will upload the PDF
asap. If you want me to mail it to you, let me know , then you can
upload it to arabeyes CVS.

Second, Greg's concern on the enconding approach that I use. I agree,
you do have valid points, but probably you missed some of my ealier
mail mentioning the reason I choose that options. FYI, with current
font and data, I can create a web application that will be able to
play the recitation of each aya, and display the translation of the
aya. I think it is very useful state already. The real  reason I
choose that option is as follows:

1. Fonts  availability. As far as I know, there is not a single font
can really display the quran as in the Madinah mushaf from unicode
text. The best that we can get, I think is MS' Arabic Typesetting, and
2 fonts from SIL (Mr Thomas mentioned to me he is working on one font,
but not release it yet) . I just discovered the font from SIL
recently, maybe they just released it. I've seen the font from MS, it
is complete and well behave according to Unicode, but still lacking
some glyph needed to render the Mushaf in a complete manner. One thing
I can tell you, I'm not impressed with the font, other than it being a
complete unicode font. Further more, it is very difficult to get. It
only distributed with MS Proffing tool, which is sold seperately from
Office2003. The SIL font look acceptable, but it is very basic font.
It does not have any aleternative glyph, for example for the character
yehnoonfinal. Also, the font is not GPLed or other opensource
licensed, so not sure we can distribute it or not. On the other hand,
I think the font that I'm working on have a better looking glyph which
is taken directly from  madinah mushaf, as you can see from the
screenshot at the website. The font distributed at the site is a very
old font, which is based on arabeyes-ar font, which is based on
KACST-QURN font. I'm trying to update the site with the new font, but
have some problem with link as I mention above. The latest font that
I'm working on will have the best feature. You can colored each glyph
individually, even for a complex ligature such as lamalef. I've sent a
screenshot showing the font features to Mete, maybe he can give some
opinion about it. So, in short, even if I make compatible to other
font, it still have problem displaying it properly. My font probably
wont be GPLed, but I will allow for non-commercial distribution. (this
is totally a seperate topic/issue)

2. The encoding approach. I choose it because that is the easiest way
so that I can implement the font which will work on most platform,
that is MS Windows and Linux. For example, the fathatan + small low
meem to indicate the sequential fathatan. Why small low meem? well,
I've tried so many combination, and that turn's out to work the best.
Under windows specifically, if I choose other code point, it will do
the following: change the direction of the text, which I can't fixed
it even putting the RTL or LTR mark, and change the shape of the
character before and after the codepoint. These 2 problems are not
easy to fix or hack. By using my approach, it will neither change the
text direction nor the shape of the character  around it, thus easier
hack to implement it in the font domain. Plus, it does not have any
other meaning or use in the quran.

Having said that, you suggest me to use personal code points and
create a font that will display each character and mark individually.
That is find, and not very hard to implement actually, but the
question is what is the benefit that I will get by doing this other
than the extra work? To let everyone know, my main source of the text
is NOT the xml file. The XML file is translated from other sources.
Thus, I can just change the map for each character, and get is
translated again in few minutes. So, to implement it the way you
mention it, it probably won't take more than an hour to create the new
file, but again, what is the benefit? We can't render it properly.
Will we get more people involve in proofreading it? I don't see that
will happen either, beacuse the process will be more difficult. Will
you proofread it if that is the case? If so, let me know, I'll sent to
you the file personally so you can start.

Conclusion: proofreading have not made much progress. Actually, I
don't mind if other people take the lead, as long as people let me
know what's going on, where the mistakes is etc, so I can make the
necessary changes.

FYI, now I'm working on merging my work with openburhan. Openburhan is
available on the net , basically will give you the root word of the
quran's word. It is a very good effort. However, I found out some
mismatch on the word count (not character count, which will never
match) of my text with openburhan text. I've not investigate if
further why, but I'll let everyone know the results.

Regards.

On 7/10/05, Gregg Reynolds <gar at arabink dot com> wrote:
> Mete Kural wrote:
> 
> >
> > Another nice program to use that shows Unicode codepoints
> > automatically as you edit is UniPad: http://www.unipad.org/main/
> >
> And I also realized that there is another, graphical method to
> edit/verify the underlying text data.  That is for Meor to create a
> "proofreading" font by copying his Quranic font and changing the mapping
> tables to make the mapping transparent.  I.e., so that each visible mark
> is produced by exactly one codepoint in the underlying textual data.
> Then, for example, if you see a low small meem in the rendered display,
> you know that it corresponds to exactly one <low small meem> in the text
> stream.  Conversely, every <low small meem> in the text stream produces
> exactly one low small meem glyph and nothing else.
> 
> A proofreading version of the font might make for a somewhat ugly
> display, but for proofreading purposes that is ok.  Once the text is
> certified, one would use the fancy font to get proper rendering.
> 
> -gregg
> 
> 
> _______________________________________________
> General mailing list
> General at arabeyes dot org
> http://lists.arabeyes.org/mailman/listinfo/general
>

Follow-Ups:
- Re: Volunteers for verifying the quran data
  - From: Gregg Reynolds
- Re: Volunteers for verifying the quran data
  - From: Nadim Shaikli

References:
- Re: Volunteers for verifying the quran data
  - From: Mete Kural
- Re: Volunteers for verifying the quran data
  - From: Gregg Reynolds

Prev by Date: Re: Arabization, techniques and problems
Next by Date: Re: Arabization, techniques and problems
Previous by thread: Re: Volunteers for verifying the quran data
Next by thread: Re: Volunteers for verifying the quran data
Index(es):
- Date
- Thread