[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Intersection/common_words script
- To: doc at arabeyes dot org
- Subject: Intersection/common_words script
- From: Nadim Shaikli <shaikli at yahoo dot com>
- Date: Tue, 23 Dec 2003 11:20:24 -0800 (PST)
Salam all - I think I have the first cut of the previously
called 'tech_wordlist' intersection script done. I am now,
instead, calling the script simply 'common words generator'
or 'intersection file script' (ie. remove the "Tech" aspect
from all of this).
http://cvs.arabeyes.org/viewcvs/scripts/translate/term_intersect.pl
I've tested the script a bit and it seems to do all that
I wanted (so far), I ran it on our current KDE and Gnome
files and it produced the following results,
< + > Total num of matched terms = 3893
< + > Total num of skipped terms = 27709
<< * >> Run-time = 4.27 minutes
that run was for all terms that don't exceed 3 words. These
are entries that have the following format,
msgid "string string string"
ie. if the msgid is on the subsequent line, I skip it. In other
words,
msgid ""
"string string string"
is skipped.
I ran the script as such (assuming you have KDE and Gnome locally),
$ /term_intersect.pl -dir1 ./gnome -dir2 ./kde
You can always do the following for help (as with most scripts),
$ term_intersect.pl -help
Do please look into the generated file (and/or run your own testcases)
and let me know if there are issues. The results are very much eye
opening (and yes there is various noise in the beginning with numbers
and punctuation terms, etc which I'll try to clean-up later).
Hope this helps - let's get consistency going !!
Salam.
- Nadim
__________________________________
Do you Yahoo!?
New Yahoo! Photos - easier uploading and sharing.
http://photos.yahoo.com/