[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: License Islamic Holy material




As someone who has the complete text of the Qur'an in rasm imlaa'i on his web site, I'd certainly welcome a program that could check the accuracy of that text. It originally came from the Islamic Computing Center in London in ISO-8859-6 encoding. I recently converted it to UTF-8 and it is now very popular with people who are searching for Qur'anic verses on Google. In fact it gets more hits than any other text on my web site. I would hate to think that the text has any mistakes in it.


Nicholas Heer


On Thu, 12 Oct 2006, Abdalla Alothman wrote:

Salam,

Problem:

You have the text of the Quran and you wish to protect it from
malicious people inserting to it, or deleting, or substituting its
contents.

Solution:

1. You create a central facility or various facilities with the purpose
of validating the text of the Quran.

2. The facility's service is very simple. A user submits the text of
the Quran, and the facility informs the user whether the text is
valid or invalid.

3. The facility can be a piece of software (Although this is too big of
a word for what we need to accomplish) that the user downloads it,
or a CGI application on a web server.

First Steps:

The first steps consist of the following:

1. You need a hash value for every sura, IF you want the user to verify the
contents of the sura.

2. You need a hash value for every aaya in the sura, IF you need the user
to verify the validity of a certain aaya in a certain sura. (we will not
implement that for now, although it's not difficult).

3. The hash values should be considered "master copies" and must be
secured. It is these values that would be used to validate a certain portion
of the contents of the Quran.

Implementation:

We will provide 4 sample suras (files) from the Quran:

* 054-alqamar-utf.txt
* 105-alfeel-utf.txt
* 110-alnasr-utf.txt
* 112-alikhlas-utf.txt

(Files included in attachment).

Those files are in UTF-8, Arabic and typed according to the
imlaa-ee rasm of the Quran with diacritical marks.

* From those files, we will generate a Perl data structure. The Perl
hash -- which is an associative array -- data structure is currently the best
choice for us. The hash will map a certain sura name with the hash value
of the sura (the contents of the sura). The data structure will be saved in
a file called "hashvalues.txt"

NOTE: The hash values might differ, so people interested in testing the
scripts might consider not to copy and run the scripts blindly; the values
have to be generated.

# =========================
# =BEGIN SCRIPT: generatehash.pl=
# =========================
#!/usr/bin/perl -w
use strict;
use warnings;
# The Digest::SHA1 perl module comes with many
# distributions or it can be downloaded from the net.
use Digest::SHA1;

# counter to determine when to add "," and "\n"
my $cnt = 0;

# open the output file, hashvalues.txt
open(OUTPUT, ">>", "hashvalues.txt");

#print the first line in the output file.
print OUTPUT "my %suras = (";

# enter iteration
foreach(<*-utf.txt>)
{
 # open the current sura file.
 open(INPUT, "$_");
 print OUTPUT ",\n" if $cnt > 0;

 # create a new SHA1 object -- we will use SHA1
 my $sha = new Digest::SHA1;

 # Add the file.
 $sha->addfile(*INPUT);

 # I want Base 64 output
 my $b64out = $sha->b64digest();
 close INPUT;

 # It helps to see what's going on. Next line prints the generated
 # checksum to STDOUT
 print "$b64out\n";

 # Now, you need to create the key of the hash data structure.
 # The key will be the name of the sura. The name of the sura
 # can be stripped from the name of the file.
 my $filename = $_;
 $filename =~ s/[0-9]{1,3}-(.*)-utf.txt/$1/g;

 # Output the name of the file which is the key, and map it to the hash value
 print OUTPUT "$filename => \"$b64out\"";
 $cnt++;
}
print OUTPUT ");\n";
close OUTPUT;

# =========================
# =END SCRIPT: generatehash.pl=
# =========================

If you run that script you will end up with something like:

Prompt #-> perl generatehash.pl
PgGcAw5JjCRNKiegEpcBSrtLN08
5UcP2xp9LTVMI4hfJ3tkU87/aDw
iNDko0O9xs1oMUZ+RZPV7fDq8d8
40Y84zKe8/pmBilii+QML52430s
Prompt #-> cat hashvalues.txt
my %suras = (alqamar => "PgGcAw5JjCRNKiegEpcBSrtLN08",
alfeel => "5UcP2xp9LTVMI4hfJ3tkU87/aDw",
alnasr => "iNDko0O9xs1oMUZ+RZPV7fDq8d8",
alikhlas => "40Y84zKe8/pmBilii+QML52430s");
Prompt #->

Now you have a small hash with 4 suras (you will end up with 114 suras).
You take that map, and use it in your other script, were the real work will be
done.

The other script will check if a certain sura file is authentic or not.

# =========================
# =BEGIN SCRIPT: checksura.pl=
# =========================
#!/usr/bin/perl -w
use strict;
use warnings;
use Digest::SHA1;

# We copy and paste the map from hashvalues.txt,
# the file we generated from the previous script. Four
# suras are very simple, but it gets tedious when there
# are 114 sura files, and that's why the first script is present.
# The first script should generate the map with 114 entries
# if all Quran files are present, not just four.
#
# NOTE: Values should be copied from "hashvalues.txt" which
# was generated from the previous script.
my %suras = (alqamar => "PgGcAw5JjCRNKiegEpcBSrtLN08",
            alfeel => "5UcP2xp9LTVMI4hfJ3tkU87/aDw",
            alnasr => "iNDko0O9xs1oMUZ+RZPV7fDq8d8",
            alikhlas => "40Y84zKe8/pmBilii+QML52430s");
my $surafile = shift;
open(FILE, "$surafile") or die "Can't open $surafile: $!\n";

my $sha = new Digest::SHA1;
$sha->addfile(*FILE);
my $b64out = $sha->b64digest();
my $currentSura = $surafile;
$currentSura =~ s/[0-9]{1,3}-(.*)-utf.txt/$1/g;
print "Checking surat $currentSura\n";
my $suraHash = $suras{$currentSura};
print "$suraHash\n";
close FILE;
print "OK\n" if($suraHash eq $b64out) || die "sura is invalid.\n";

# =========================
# =END SCRIPT: checksura.pl
# =========================

Now...

Check the "checksura.pl" with any of the attached Quran files:

Prompt #-> perl chcks.pl 110-alnasr-utf.txt
Checking surat alnasr
iNDko0O9xs1oMUZ+RZPV7fDq8d8
OK
Prompt #->

And now...

* Open an editor and edit 110-alnasr-utf.txt.
* Add a space somewhere, or delete a space or a diacritical mark, or replace
a mark with another mark and run the the script on that file again....

Prompt #-> perl chcks.pl 110-alnasr-utf.txt
Checking surat alnasr
iNDko0O9xs1oMUZ+RZPV7fDq8d8
Sura is invalid.
Prompt #->

What we have provided is a rough prototype (I don't have time to revise it,
so my sincere apologies for any errors). Don't you think it works as needed
and is much better than a license and all of its hassles from courts and
tracking the bad people down and so on?

Salam,
Abdalla Alothman