[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AASKWIX: A Statistical Koran Word Index



Asalamu alaikum

On Friday 30 December 2005 10:50, Meor Ridzuan Meor Yahaya wrote:
> 
> Indexing the Quran is exactly what I wanted to do. However, I cannot
> do this because, my arabic is very weak , I'm not a speaker!
> Anyway, I came across on a site somewhere, mentioning that in Quran,
> there are about 2000 distinct words only. So, I think it would not be
> that bad to index 2000 words. In fact, www.openburhan.com have done
> even further than that; it even have index based on root words!

The total words (not just distinct) in the Quran is 77,801 words. :)

I don't know if it will be helpful for you, but I made a C++ class
that generates a statistical word index for research purposes. The
class is part of a larger project that anyone is free to test. However,
there is an intellectual property pending on the index structure (the
idea only) which I called AASKWIX.

It's quite large, so I'll just point you to two samples:

http://www.pheye.net/abdalla/examples/kindex/koranhtmlindex-11.html
http://www.pheye.net/abdalla/examples/kindex/koranhtmlindex-32.html

(best viewed with Konqueror)

The above samples are generated by the application for screen and
publication. The application also generates the same data in SQL for
relational databases and XML for NXDs.

* Every file number represents an entry for a specific letter.

* Every word (not root, but word) contains its frequency (for example, 'Eesa
and WA'Eesa have different frequencies and both have different entries).

* Gives the sura number (SN) of the word (nothing new). Followed by...

* The aaya number (AN) (nothing new). Followed by....

* WA - word's order / number in the aaya. Followed by...

* WS - word's order / number in the sura. Followed by...

* WK - word's order / number in the entire Koran (Quran).

If you want it, I can give it to you in parts.

If you are done viewing the samples, I kindly ask you to let me know.
(they must be deleted because I don't have too much space.)

Wishing you and your family peace and good health.


Salam,
Abdalla Alothman