[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Volunteers for verifying the quran data



Several more things worth mentioning. First, I just discovered that
the 2 font from SIL does have some minor problems displaying some of
the character properly. The characters are small waw (06E5) and small
yeh (06E6). Both should be a spacing character, but it is not in the
font. Also, sukun variance, that is the 06DF and 06E0, are displayed
very similar to unicode document, which someone (I think Thomas)
already noted that it is not really correct.

Second, as far as I know, there are several bodies that do some
verification process (the is a department in one of the ministry in
Malaysia whom a publisher of Quran need to get their approval from).
However, I don't think they will certify electronic data, like the one
mentioned by Greg. They probably will certify the visual
representation of it. So, that is why I was focusing on the visual
aspect of it to be correct, but wihout concerning too much on the
consistency of the actual data. I hope that someone will take this up
and get it certify somehow. If we were to certify the electronic data,
most likely we will have to do it ourselves, but the drawback is we
don't get any certification from anyone. I think it is very important
to get the Quran standardize in the electronic form, since there are
several version of electronic Quran around the net, but none really
comply with Rasm Uthmani. Hope this will be part of the solution.
To get the electronic data to be certify, we have a long way to go.
First, a standard need to be develop, which is far from complete. Greg
just posted a mail on this, so maybe this will get things going in the
right direction. Then maybe we can get the Quran encoded according to
the standard. I think probably we can get it done in 3-5 years time.

Regards.

On 7/11/05, Gregg Reynolds <gar at arabink dot com> wrote:
> Meor Ridzuan Meor Yahaya wrote:
> >
> > Second, Greg's concern on the enconding approach that I use. I agree,
> > you do have valid points, but probably you missed some of my ealier
> > mail mentioning the reason I choose that options. FYI, with current
> > font and data, I can create a web application that will be able to
> > play the recitation of each aya, and display the translation of the
> > aya. I think it is very useful state already. The real  reason I
> 
> I agree it is very useful, and thank you for doing so much hard work.  I
> don't want you to think I'm complaining about it.  I guess my main
> concern is twofold:  unicode compatibility, and font-independence.  That
> is the best way (IMO) to make your work useful for the largest number of
> people.
> >
> > 1. Fonts  availability. As far as I know, there is not a single font
> ...etc...
> 
> I agree, fonts are an issue.  But here's a proposition for debate:
> fonts don't matter.  What counts is the text encoding.
> 
> Reasoning:  new fonts will always come along.  There will always be a
> "better" one, or at least one that is more to your taste (or mine, or
> whomsoever's).  Given correct encoded text, you can use whatever font
> you like.  But if your encoded text is designed for use with a specific
> font (i.e. for a private font mapping convention), then you won't be
> able to use new fonts with it.
> 
> In other words, it's a Good Thing to divide work between font design and
> development, and text encoding verification.
> 
> Proposition 2  (and this is the more interesting and philosophical one):
> the real "matn" of the text is the encoded text, not the (graphical or
> audio) rendition.  Get the encoding right, and you have inscribed the
> Quran in cyberspace, independent of visual (or any other)
> representation.  From that concrete representations (visual, aural,
> other?) can be generated.  (I.e., there is the very interesting question
> of what the core of the Quran really "is" - ink on paper?  sound waves?
>   a sequence of abstract "letters"? etc.)
> 
> >
> > 2. The encoding approach. I choose it because that is the easiest way
> > so that I can implement the font which will work on most platform,
> ...(low small meem)...
> > hack to implement it in the font domain. Plus, it does not have any
> > other meaning or use in the quran.
> 
> Huh?  It looked to me like you had used <low small meem> to mean
> different things in the verse I mentioned.  So I infer that you mean the
> *combination* of low small meem with another codepoint is unique.  E.g.
> <dammatan><low small meem> and <kasratan><low small meem> are two
> different things.  Have I understood your method correctly?
> >
> > Having said that, you suggest me to use personal code points and
> 
> (FYI:  PUA = Private Use Area; here "private use" simply means not
> standardized by Unicode; any community of users can agree that e.g.
> codepoint XYZ of the PUA means some specific thing.)
> 
> > create a font that will display each character and mark individually.
> > That is find, and not very hard to implement actually, but the
> > question is what is the benefit that I will get by doing this other
> > than the extra work?
> 
> You will know that people are looking at (verifying) a direct
> representation of the underlying text.  Sorry to be a nag, but I can't
> stress this enough:  it is the underlying text data that counts, not the
> graphical rendition generated by your (or any other) font.  In my
> opinion, at least.  ;)
> 
> > To let everyone know, my main source of the text
> > is NOT the xml file.  The XML file is translated from other sources.
> 
> Ok; but then why are we verifying the XML file and not the original
> source???
> 
> > Thus, I can just change the map for each character, and get is
> > translated again in few minutes. So, to implement it the way you
> > mention it, it probably won't take more than an hour to create the new
> > file, but again, what is the benefit? We can't render it properly.
> > Will we get more people involve in proofreading it?
> 
> Not necessarily.  I guess it's a question of quality assurance; I think
> you have a better likelihood of high quality proofreading with a
> proofreading font.  Don't forget, proofreading is a profession.  Simply
> reading a text is not proofreading.  So I hope you are not too
> disappointed if you haven't received much response to your request for
> help.  It's a big job, and it isn't easy to find people like you.
> 
> It would be nice to find a bunch of volunteers on the web to do this
> job; but even if you did, how could you be sure of the quality of the
> proofreading?  This is a sticky problem.
> 
> I would suggest trying to find a Haafiz or somebody who knows the Quran
> well, and pay him to proofread.  If necessary, pay him to learn how to
> use the computer first.  Personally I would be willing to pony up a few
> hundred bucks.  Maybe even more, if I am convinced the money is being
> put to good use.  (I live and work in the US; purely because of the
> difference in local economies, amounts that are not especially large
> here are pretty substantial in lots of places, as we all know.  Far
> better to directly support the work of somebody in the so-called "third
> world" than to donate to a big charity or go to lots of movies. ;)
> 
>   I don't see that
> > will happen either, beacuse the process will be more difficult. Will
> > you proofread it if that is the case? If so, let me know, I'll sent to
> > you the file personally so you can start.
> 
> I can do some, but it's a pretty big book. ;)  For now, don't send me
> the file; my time is pretty tight.  Later I might be interested in
> working with pieces of it.
> 
> >
> > FYI, now I'm working on merging my work with openburhan. Openburhan is
> > available on the net , basically will give you the root word of the
> > quran's word. It is a very good effort.
> 
> Very cool.  I wonder what their technology is?  Do you suppose they do
> all that stuff by hand?
> 
> FYI, I'm on the verge of posting a document with a number of ideas about
> encoding Arabic that might be of interest to you.  It includes a kind of
> encoding/transliteration scheme that piggy-backs on 8859-1 (latin-1).
> It also includes the capability of encoding roots.  I plan to take a few
> of the small Suras from your file, and (re-)encode them with my little
> scheme so you and others can have a look and see if you think any of the
> ideas are useful.
> 
> Sincerely,
> 
> gregg
> 
> 
> _______________________________________________
> General mailing list
> General at arabeyes dot org
> http://lists.arabeyes.org/mailman/listinfo/general
>