[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Technical Computing Dictionary



Dear all,

I have been playing around with the sources of the dictionary (thank 
you, Djihed Afifi, for granting me access), and I also developed some 
software to maintain and reformat it. I have no write access to the 
Arabeyes servers (and do not want it), so some member of the project
will have to collect the material if interested. I put the material on 
my own server for free downloading, and I waive all owner's rights on it.

At the URL <129.69.218.213/ifi/bs/research/arabdict/> you will find:

- a file 'techdict.pdf' containing a new, two-column version of the 
dictionary, based on the data of April 29, 2007.

- a file 'arabdict.pdf' based on the same data, but formatted right-
to-left and sorted according to the Arabic collating sequence, for 
Arabic/English lookup. Multiple Arabic translations for the same
English term produce multiple entries; terms without any translation
are omitted (what could we do with them otherwise?).

- a file 'makedict.tgz' containing the software to recreate the above
files whenever new input data become available. There is a Perl script
to perform and control the processing task, plus a rather large number
of utility files. The presence of 'pdflatex' and of a reasonably current 
version of ArabTeX are assumed. For more info see 'readme', 'manifest',
and the source files themselves; there are comments to explain what is
expected to happen, as the code is probably unreadable except for some
TeX experts :-)

Note: The input data are expected to be in UTF/8 encoding in the format
	English text====Arabic text
The English text is supposed to contain only ASCII characters from the
range U+0000 - U+007F, whereas the Arabic text may only contain characters 
from the Arabic block U+0600 - U+06FF, the Lam-Alif ligature glyphs from 
U+FEF5 - U+FEFC, and the punctuation marks ONLY from the ASCII range 
U+0000 - U+007F. Any other ASCII text and any other symbols outside the 
indicated ranges may produce garbage or might even blow up the processing. 
These restrictions come from the fact that the Arabic field is parsed in
order to compute a sorting key, and the parser expects Arabic only.

There are, however, two exceptions:

- In the Arabic field, references to other English words are possible
in the format 'see ASCII text' (without the quotes!).

- Within the Arabic field, English words may appear if and only if they
are included in <angle brackets>.

(This info ought to reside in the 'readme' file, and will migrate there;
I leave it here to give the readers a hint on what is happening below
the surface).

In case of problems, bugs, or just strange effects please complain by
private E-Mail.

Peace

Klaus
-- 
Prof. Dr. Klaus Lagally  | mailto:lagally at informatik dot uni-stuttgart dot de
Institut fuer Formale    | http://www.informatik.uni-stuttgart.de/ ...
  Methoden der Informatik|    ... fmi/bs/people/lagally.htm
Abteilung Betriebsoftware| Tel.  +49-711-7816392 |Zeige mir deine Uhr,
Universitaetsstrasse 38  | FAX   +49-711-7816370 |  und ich sage dir,
70569 Stuttgart, GERMANY |                       |  wie spaet es ist.