[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Volunteers for verifying the quran data

To: General Arabization Discussion <general at arabeyes dot org>
Subject: Re: Volunteers for verifying the quran data
From: Gregg Reynolds <gar at arabink dot com>
Date: Sun, 10 Jul 2005 21:40:57 -0500
User-agent: Mozilla Thunderbird 1.0.2 (Windows/20050317)

Meor Ridzuan Meor Yahaya wrote:


Second, Greg's concern on the enconding approach that I use. I agree,
you do have valid points, but probably you missed some of my ealier
mail mentioning the reason I choose that options. FYI, with current
font and data, I can create a web application that will be able to
play the recitation of each aya, and display the translation of the
aya. I think it is very useful state already. The real  reason I

I agree it is very useful, and thank you for doing so much hard work. I don't want you to think I'm complaining about it. I guess my main concern is twofold: unicode compatibility, and font-independence. That is the best way (IMO) to make your work useful for the largest number of people.

1. Fonts availability. As far as I know, there is not a single font

...etc...

I agree, fonts are an issue. But here's a proposition for debate: fonts don't matter. What counts is the text encoding.

Reasoning: new fonts will always come along. There will always be a "better" one, or at least one that is more to your taste (or mine, or whomsoever's). Given correct encoded text, you can use whatever font you like. But if your encoded text is designed for use with a specific font (i.e. for a private font mapping convention), then you won't be able to use new fonts with it.

In other words, it's a Good Thing to divide work between font design and development, and text encoding verification.

Proposition 2 (and this is the more interesting and philosophical one): the real "matn" of the text is the encoded text, not the (graphical or audio) rendition. Get the encoding right, and you have inscribed the Quran in cyberspace, independent of visual (or any other) representation. From that concrete representations (visual, aural, other?) can be generated. (I.e., there is the very interesting question of what the core of the Quran really "is" - ink on paper? sound waves? a sequence of abstract "letters"? etc.)


2. The encoding approach. I choose it because that is the easiest way
so that I can implement the font which will work on most platform,

...(low small meem)...

hack to implement it in the font domain. Plus, it does not have any
other meaning or use in the quran.

Huh? It looked to me like you had used <low small meem> to mean different things in the verse I mentioned. So I infer that you mean the *combination* of low small meem with another codepoint is unique. E.g. <dammatan><low small meem> and <kasratan><low small meem> are two different things. Have I understood your method correctly?

Having said that, you suggest me to use personal code points and

(FYI: PUA = Private Use Area; here "private use" simply means not standardized by Unicode; any community of users can agree that e.g. codepoint XYZ of the PUA means some specific thing.)

create a font that will display each character and mark individually.
That is find, and not very hard to implement actually, but the
question is what is the benefit that I will get by doing this other
than the extra work?

You will know that people are looking at (verifying) a direct representation of the underlying text. Sorry to be a nag, but I can't stress this enough: it is the underlying text data that counts, not the graphical rendition generated by your (or any other) font. In my opinion, at least. ;)

To let everyone know, my main source of the text
is NOT the xml file.  The XML file is translated from other sources.

Ok; but then why are we verifying the XML file and not the original source???

Thus, I can just change the map for each character, and get is
translated again in few minutes. So, to implement it the way you
mention it, it probably won't take more than an hour to create the new
file, but again, what is the benefit? We can't render it properly.
Will we get more people involve in proofreading it?

Not necessarily. I guess it's a question of quality assurance; I think you have a better likelihood of high quality proofreading with a proofreading font. Don't forget, proofreading is a profession. Simply reading a text is not proofreading. So I hope you are not too disappointed if you haven't received much response to your request for help. It's a big job, and it isn't easy to find people like you.

It would be nice to find a bunch of volunteers on the web to do this job; but even if you did, how could you be sure of the quality of the proofreading? This is a sticky problem.

I would suggest trying to find a Haafiz or somebody who knows the Quran well, and pay him to proofread. If necessary, pay him to learn how to use the computer first. Personally I would be willing to pony up a few hundred bucks. Maybe even more, if I am convinced the money is being put to good use. (I live and work in the US; purely because of the difference in local economies, amounts that are not especially large here are pretty substantial in lots of places, as we all know. Far better to directly support the work of somebody in the so-called "third world" than to donate to a big charity or go to lots of movies. ;)

I don't see that

will happen either, beacuse the process will be more difficult. Will
you proofread it if that is the case? If so, let me know, I'll sent to
you the file personally so you can start.

I can do some, but it's a pretty big book. ;) For now, don't send me the file; my time is pretty tight. Later I might be interested in working with pieces of it.


FYI, now I'm working on merging my work with openburhan. Openburhan is
available on the net , basically will give you the root word of the
quran's word. It is a very good effort.

Very cool. I wonder what their technology is? Do you suppose they do all that stuff by hand?

FYI, I'm on the verge of posting a document with a number of ideas about encoding Arabic that might be of interest to you. It includes a kind of encoding/transliteration scheme that piggy-backs on 8859-1 (latin-1). It also includes the capability of encoding roots. I plan to take a few of the small Suras from your file, and (re-)encode them with my little scheme so you and others can have a look and see if you think any of the ideas are useful.

Sincerely,

gregg

Follow-Ups:
- Re: Volunteers for verifying the quran data
  - From: Meor Ridzuan Meor Yahaya
- Re: Volunteers for verifying the quran data
  - From: Meor Ridzuan Meor Yahaya

References:
- Re: Volunteers for verifying the quran data
  - From: Mete Kural
- Re: Volunteers for verifying the quran data
  - From: Gregg Reynolds
- Re: Volunteers for verifying the quran data
  - From: Meor Ridzuan Meor Yahaya

Prev by Date: Re: Arabization, techniques and problems
Next by Date: Re: Arabization, techniques and problems
Previous by thread: Re: Volunteers for verifying the quran data
Next by thread: Re: Volunteers for verifying the quran data
Index(es):
- Date
- Thread