[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Volunteers for verifying the quran data
- To: General Arabization Discussion <general at arabeyes dot org>
- Subject: Re: Volunteers for verifying the quran data
- From: Gregg Reynolds <gar at arabink dot com>
- Date: Sun, 10 Jul 2005 21:40:57 -0500
- User-agent: Mozilla Thunderbird 1.0.2 (Windows/20050317)
Meor Ridzuan Meor Yahaya wrote:
Second, Greg's concern on the enconding approach that I use. I agree,
you do have valid points, but probably you missed some of my ealier
mail mentioning the reason I choose that options. FYI, with current
font and data, I can create a web application that will be able to
play the recitation of each aya, and display the translation of the
aya. I think it is very useful state already. The real reason I
I agree it is very useful, and thank you for doing so much hard work. I
don't want you to think I'm complaining about it. I guess my main
concern is twofold: unicode compatibility, and font-independence. That
is the best way (IMO) to make your work useful for the largest number of
people.
1. Fonts availability. As far as I know, there is not a single font
...etc...
I agree, fonts are an issue. But here's a proposition for debate:
fonts don't matter. What counts is the text encoding.
Reasoning: new fonts will always come along. There will always be a
"better" one, or at least one that is more to your taste (or mine, or
whomsoever's). Given correct encoded text, you can use whatever font
you like. But if your encoded text is designed for use with a specific
font (i.e. for a private font mapping convention), then you won't be
able to use new fonts with it.
In other words, it's a Good Thing to divide work between font design and
development, and text encoding verification.
Proposition 2 (and this is the more interesting and philosophical one):
the real "matn" of the text is the encoded text, not the (graphical or
audio) rendition. Get the encoding right, and you have inscribed the
Quran in cyberspace, independent of visual (or any other)
representation. From that concrete representations (visual, aural,
other?) can be generated. (I.e., there is the very interesting question
of what the core of the Quran really "is" - ink on paper? sound waves?
a sequence of abstract "letters"? etc.)
2. The encoding approach. I choose it because that is the easiest way
so that I can implement the font which will work on most platform,
...(low small meem)...
hack to implement it in the font domain. Plus, it does not have any
other meaning or use in the quran.
Huh? It looked to me like you had used <low small meem> to mean
different things in the verse I mentioned. So I infer that you mean the
*combination* of low small meem with another codepoint is unique. E.g.
<dammatan><low small meem> and <kasratan><low small meem> are two
different things. Have I understood your method correctly?
Having said that, you suggest me to use personal code points and
(FYI: PUA = Private Use Area; here "private use" simply means not
standardized by Unicode; any community of users can agree that e.g.
codepoint XYZ of the PUA means some specific thing.)
create a font that will display each character and mark individually.
That is find, and not very hard to implement actually, but the
question is what is the benefit that I will get by doing this other
than the extra work?
You will know that people are looking at (verifying) a direct
representation of the underlying text. Sorry to be a nag, but I can't
stress this enough: it is the underlying text data that counts, not the
graphical rendition generated by your (or any other) font. In my
opinion, at least. ;)
To let everyone know, my main source of the text
is NOT the xml file. The XML file is translated from other sources.
Ok; but then why are we verifying the XML file and not the original
source???
Thus, I can just change the map for each character, and get is
translated again in few minutes. So, to implement it the way you
mention it, it probably won't take more than an hour to create the new
file, but again, what is the benefit? We can't render it properly.
Will we get more people involve in proofreading it?
Not necessarily. I guess it's a question of quality assurance; I think
you have a better likelihood of high quality proofreading with a
proofreading font. Don't forget, proofreading is a profession. Simply
reading a text is not proofreading. So I hope you are not too
disappointed if you haven't received much response to your request for
help. It's a big job, and it isn't easy to find people like you.
It would be nice to find a bunch of volunteers on the web to do this
job; but even if you did, how could you be sure of the quality of the
proofreading? This is a sticky problem.
I would suggest trying to find a Haafiz or somebody who knows the Quran
well, and pay him to proofread. If necessary, pay him to learn how to
use the computer first. Personally I would be willing to pony up a few
hundred bucks. Maybe even more, if I am convinced the money is being
put to good use. (I live and work in the US; purely because of the
difference in local economies, amounts that are not especially large
here are pretty substantial in lots of places, as we all know. Far
better to directly support the work of somebody in the so-called "third
world" than to donate to a big charity or go to lots of movies. ;)
I don't see that
will happen either, beacuse the process will be more difficult. Will
you proofread it if that is the case? If so, let me know, I'll sent to
you the file personally so you can start.
I can do some, but it's a pretty big book. ;) For now, don't send me
the file; my time is pretty tight. Later I might be interested in
working with pieces of it.
FYI, now I'm working on merging my work with openburhan. Openburhan is
available on the net , basically will give you the root word of the
quran's word. It is a very good effort.
Very cool. I wonder what their technology is? Do you suppose they do
all that stuff by hand?
FYI, I'm on the verge of posting a document with a number of ideas about
encoding Arabic that might be of interest to you. It includes a kind of
encoding/transliteration scheme that piggy-backs on 8859-1 (latin-1).
It also includes the capability of encoding roots. I plan to take a few
of the small Suras from your file, and (re-)encode them with my little
scheme so you and others can have a look and see if you think any of the
ideas are useful.
Sincerely,
gregg