[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Arabic Quran XML data



--- Mete Kural wrote:
> Hello Ossama,
> 
> Thank you for your response. When you said "seperate
> the files into seperate ayas", did you mean
> "seperate
> suras"? That is already done. The XML file is
> seperated into suras here:
>
http://cvs.arabeyes.org/viewcvs/projects/quran/data/ar/text/
> So I'm that the XML data in the above folder is the
> most recently updated data.

Yes, sorry. I mean to say Suras and not Ayas.

>I have two
> questions/suggestions about this data:
> 
> 1) Can we get rid of the <searchtext> element in the
> Quran XML data and instead use a smarter search
> algortihm that removes special characters and
> diacritics before searching the Quran text? I think
> that maintaing Arabic text data that has the special
> characters and diacritics manually removed is more
> error-prone than using a smarter search algorithm.

I can't comment/decide on this. Some other more
knowledgeable person (other maintainer/contributor)
of the Qur'an project may know better.

> By the way, this algorithm doesn't really have to be
> that
> smart, it could simply remove the characters that
> should not be searched from both the Quran text and
> the search keyword and match them against each
> other.
> Of course even smarter algorithms that provide
> grammatical context aware searching would be nicer.

Really don't know! :|

> 2) What are the copyright conditions on this Arabic
> Quran XML data? The reason I am asking is that I
> personally would like to contribute to the fixing of
> this Arabic Quran XML data,

Fixing? Like what? Mistakes, glyphs or what?

> and I also know a few
> other friends who would like to contribute as well,
> but we want to make sure that the copyright will not
> restrict anyone to download this data for free
> without
> restrictions and make additional changes to it,
> similar to a typical open-source license.

As far as I remember, various previous threads noted
that you'll not be able to modify things and that the
content will be correct and solid.
As discussed before, the GPL wouldn't be suitable at
all, because it allows _anyone_ to do _any changes_
which is so dangerous.
All I could say, at least to my knowledge extent, that
we're contacting the QuranComplex (www.qurancomplex)
for support on this issue.

> Why is this
> important? Because not all Quran manuscripts are
> exactly the same and they have slight differences in
> the orthography (spelling) of certain words and the
> option should be given to whomever downloads these
> files to change the spelling of such words to
> whichever style they prefer and use it as such.

I think this isn't 100% correct. Having different
readings doesn't mean we also have different spelling.
Also, not 'whomever' and not even 'some' people can
change 'the spelling of such words to whichever style
they prefer', other wise it wouldn't be Qur'an!

This is just my knowledge and I hope other add/correct
any mistakes I may have done, if any.

regards,
Ossama Mohammad Khayat

__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com