[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Volunteers for verifying the quran data
- To: General Arabization Discussion <general at arabeyes dot org>
- Subject: Re: Volunteers for verifying the quran data
- From: Meor Ridzuan Meor Yahaya <meor dot ridzuan at gmail dot com>
- Date: Mon, 11 Jul 2005 15:35:50 +0800
- Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=DIrhxXfrTX4S1sHNFQHQMO31xUkvA6k4XxphfTLTPrI8JPv2M7akWa94NVvANTfpkq3lLeVorqtpiS+SSSQ+nUgnvl80Kr7qdOShJcbWT53AWaKtSFGf6Ip46VfGaEvD9eUoaSZMuzpiPAQ+yCGzV67bKOe9rSkJm/bMThFSs0w=
On 7/11/05, Gregg Reynolds <gar at arabink dot com> wrote:
> Meor Ridzuan Meor Yahaya wrote:
> >
> > Second, Greg's concern on the enconding approach that I use. I agree,
> > you do have valid points, but probably you missed some of my ealier
> > mail mentioning the reason I choose that options. FYI, with current
> > font and data, I can create a web application that will be able to
> > play the recitation of each aya, and display the translation of the
> > aya. I think it is very useful state already. The real reason I
>
> I agree it is very useful, and thank you for doing so much hard work. I
> don't want you to think I'm complaining about it. I guess my main
> concern is twofold: unicode compatibility, and font-independence. That
> is the best way (IMO) to make your work useful for the largest number of
> people.
True, but in my opinion, that is the best data that I can provide to
have it visually correct, with sticking to unicode as much as I can.
The area I think it is not covered by Unicode, that is where I use my
own scheme.
> >
> > 1. Fonts availability. As far as I know, there is not a single font
> ...etc...
>
> I agree, fonts are an issue. But here's a proposition for debate:
> fonts don't matter. What counts is the text encoding.
>
> Reasoning: new fonts will always come along. There will always be a
> "better" one, or at least one that is more to your taste (or mine, or
> whomsoever's). Given correct encoded text, you can use whatever font
> you like. But if your encoded text is designed for use with a specific
> font (i.e. for a private font mapping convention), then you won't be
> able to use new fonts with it.
>
> In other words, it's a Good Thing to divide work between font design and
> development, and text encoding verification.
Again, very true, but I don't see this happening. Actually, I was
waiting for this time to come, but still waiting ...
>
> Proposition 2 (and this is the more interesting and philosophical one):
> the real "matn" of the text is the encoded text, not the (graphical or
> audio) rendition. Get the encoding right, and you have inscribed the
> Quran in cyberspace, independent of visual (or any other)
> representation. From that concrete representations (visual, aural,
> other?) can be generated. (I.e., there is the very interesting question
> of what the core of the Quran really "is" - ink on paper? sound waves?
> a sequence of abstract "letters"? etc.)
>
This is very true, and I agree with you. But on the other hand, how
long unicode document have been publish? How many fonts are available
out there that support proper unicode Arabic block? That is my point
exactly. I would not have been involve in creating fonts if good fonts
arae available. Historically, I discovered Arabeyes quran project
quite a long time ago (probably more than 2 years ago, 3 years maybe).
Since then, I was hoping the project will complete one day, complete
with the fonts avaialbe under GPL. I did not have much understanding
about font creation, unicode arabic implementation etc back then. So,
late lat year, I decided to further investigate what are the real
issues preventing the project to be completed. After some time, I
decided to create a new font and the unicode document to provide the
missing gap. For me, fonts are the most frustrating part. There are
quite a number arabic fonts available, but none really unicode
compatible. This have forced me to do some of the most difficult part
in developing fonts. Even now I'm at a very early stage in learning
font hinting process. Thus, my font will be free for non commercial
use, but not GPLed. The reason is as follows: font does not have a
source code. So, if I release the font in GPLed, what will prevent,
say Microsoft from including my font in their Windows Platform? As far
I can tell, there is none. They can modify it, and still be compliant
with GPL. So, what I would like to achieve is that more good arabic
fonts avaible and develop my many more parties.
> >
> > 2. The encoding approach. I choose it because that is the easiest way
> > so that I can implement the font which will work on most platform,
> ...(low small meem)...
> > hack to implement it in the font domain. Plus, it does not have any
> > other meaning or use in the quran.
>
> Huh? It looked to me like you had used <low small meem> to mean
> different things in the verse I mentioned. So I infer that you mean the
> *combination* of low small meem with another codepoint is unique. E.g.
> <dammatan><low small meem> and <kasratan><low small meem> are two
> different things. Have I understood your method correctly?
Ok, I did not make myself clear on this, but I think I did mentioned
it somewhere in on of the mailing list. You probably got it, but let
me explain, in case. Basically, there 2 types of small meem being used
in the document. 1 is the correct code assign, which should be
displayed. Examples are fatha /fathatan (could not remember which one
I use, but I'm sure the font can handle both correctly) + small high
meem, which essentially represent iqlab. And the second, is the meem
that should not be displayed. For example, fatha/fathatan + small low
meem. As you can see, this combination should not exist in the Quran.
I use this basically to represent the sequential tanween (tanween
other than izhar). As explain earlier, this combination is chosen
because this will make the font development easy, nothing else. So,
basically, fathatan/dammatan + low meem, kasratan + high meem.
> >
> > Having said that, you suggest me to use personal code points and
>
> (FYI: PUA = Private Use Area; here "private use" simply means not
> standardized by Unicode; any community of users can agree that e.g.
> codepoint XYZ of the PUA means some specific thing.)
>
> > create a font that will display each character and mark individually.
> > That is find, and not very hard to implement actually, but the
> > question is what is the benefit that I will get by doing this other
> > than the extra work?
>
> You will know that people are looking at (verifying) a direct
> representation of the underlying text. Sorry to be a nag, but I can't
> stress this enough: it is the underlying text data that counts, not the
> graphical rendition generated by your (or any other) font. In my
> opinion, at least. ;)
>
Again, true. But if the underlying data cannot be comprehend by
anyone, what good does it bring? It is in my opinion, that it will be
the easiest and most beneficial if we have the data visually correct
first, since this is achievable, to proofread it. Then, from this
data, to make it technically and linguistically correct, is not very
difficult at all. As mentioned, I can change the map anytime...
> > To let everyone know, my main source of the text
> > is NOT the xml file. The XML file is translated from other sources.
>
> Ok; but then why are we verifying the XML file and not the original
> source???
>
> > Thus, I can just change the map for each character, and get is
> > translated again in few minutes. So, to implement it the way you
> > mention it, it probably won't take more than an hour to create the new
> > file, but again, what is the benefit? We can't render it properly.
> > Will we get more people involve in proofreading it?
>
> Not necessarily. I guess it's a question of quality assurance; I think
> you have a better likelihood of high quality proofreading with a
> proofreading font. Don't forget, proofreading is a profession. Simply
> reading a text is not proofreading. So I hope you are not too
> disappointed if you haven't received much response to your request for
> help. It's a big job, and it isn't easy to find people like you.
>
> It would be nice to find a bunch of volunteers on the web to do this
> job; but even if you did, how could you be sure of the quality of the
> proofreading? This is a sticky problem.
>
> I would suggest trying to find a Haafiz or somebody who knows the Quran
> well, and pay him to proofread. If necessary, pay him to learn how to
> use the computer first. Personally I would be willing to pony up a few
> hundred bucks. Maybe even more, if I am convinced the money is being
> put to good use. (I live and work in the US; purely because of the
> difference in local economies, amounts that are not especially large
> here are pretty substantial in lots of places, as we all know. Far
> better to directly support the work of somebody in the so-called "third
> world" than to donate to a big charity or go to lots of movies. ;)
>
I was thinking exactly the same thing, but I dont have enough
resources. I you are willing to spent some money on this, maybe we can
work out something. I'm planning to visit a hafiz school this weekend,
I'll asked them for their opinion if I get the oppurtunity.
> I don't see that
> > will happen either, beacuse the process will be more difficult. Will
> > you proofread it if that is the case? If so, let me know, I'll sent to
> > you the file personally so you can start.
>
> I can do some, but it's a pretty big book. ;) For now, don't send me
> the file; my time is pretty tight. Later I might be interested in
> working with pieces of it.
>
> >
> > FYI, now I'm working on merging my work with openburhan. Openburhan is
> > available on the net , basically will give you the root word of the
> > quran's word. It is a very good effort.
>
> Very cool. I wonder what their technology is? Do you suppose they do
> all that stuff by hand?
I downloaded a copy of their database, and I think they did it by
hand. The website (www.openburhan.com) is run by mysql back end, but I
manage transfer it to sqlite database without any problem at all.
After some investigation, the differences of word count from my text
to his is mainly due to the word ya, (like yaabanii isarael). In my
text, the word ya is not treated as one word, whereby his does. I
think the difficulties in treated the word ya as a word is, not every
place in the Quran the word ya is spelled out as yeh + alef . At some
places, the alef is omitted (according to rasm Uthmani). So, I think
it is better not to treat it as a word. I'm not good in arabic, so
what do others think about this?
>
> FYI, I'm on the verge of posting a document with a number of ideas about
> encoding Arabic that might be of interest to you. It includes a kind of
> encoding/transliteration scheme that piggy-backs on 8859-1 (latin-1).
> It also includes the capability of encoding roots. I plan to take a few
> of the small Suras from your file, and (re-)encode them with my little
> scheme so you and others can have a look and see if you think any of the
> ideas are useful.
>
It will be very usefull indeed, and look forward to your samples.
Regards.
> Sincerely,
>
> gregg
>
>
> _______________________________________________
> General mailing list
> General at arabeyes dot org
> http://lists.arabeyes.org/mailman/listinfo/general
>