[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Preview of QAC's Technical Dictionary Plan

On Tue, 8 Feb 2005 00:37:18 -0800 (PST), Nadim Shaikli
<shaikli at yahoo dot com> wrote:
> --- Abdulaziz Al-Arfaj <aalarfaj at gmail dot com> wrote:
> > As you guys know QAC has on several occassions tried to create a
> > dictionary of technical terms, or a common terms list, or common terms
> > PO or call it what you like, for use by translators. We didn't get
> > anywhere as of yet.
> Stay clear from calling them what they are not - stick to 'technical
> terms' or 'technical dictionary' :-)

Your right. And for one thing, we should steer clear from putting
"common terms" in there. The uncommon terms are just as important.

> I commend you on your intensity and drive - I can't wait to see this
> really happen and take off and I really hope all those hot-n-heavy into
> translation get a chance to read and comment on this ASAP (you should
> consider posting all of this to the 'doc' list -- I was surprised to
> see it on 'core' instead of 'doc' -- sooner rather than later).

This was just a preview. I need to polish that paper first and add a
few things, to make sure it stands the test of the good ole' Doc list,
or as Arafat aptly called it on IRC, "the idea killing machine".

> Out of curiosity, why isn't a collected "automatic" list of words from
> previous files so frowned upon (search for 'DITCHES' in your plan).  It
> seems to me like such a list would be a good starting point to creating
> a "sanitized" list to proceed from, no ?

One would think, but that is not the case. Here is the most important
thing I learned since I started translating for Arabeyes: The hardest
terms to translate are not the ones that occur frequently, but the
obscure ones. I want to put every obscure term imaginable into this
new list. I dont want it to be just for translators, but for every
confused speaker of Arabic on the web...

As it stands, the list we generated automatically contained hundreds
of terms for geographical locations. While these are important, they
belong on Wordlist, not here. There are also many many meaningless
terms like "1st" through "30th", redundant terms such as "file",
"file:", and "files", unnecessary combinations such as "resize window"
as well as lots of plain numbers and punctuation. This greatly reduces
the actual number of useful words in this list, and we would actually
have to go through it cleaning it up before we start translating it.

As I mentioned, the scripting method probably will not catch obscure
phrases such as "object-oriented", "grassroots", "object request
broker", "Just-In-Time", "memory leak", "nice process",
"drag-and-drop", "expunge", "connection-oriented", "sha-bang" and many
many others, all of which are valid and should be in the dictionary.

The list I have been composing is already around 1500 terms in size,
and most of those terms are of high value to the user, but I will have
to greatly reduce it. All words which are just plain "English" words
which dont have too many meanings such as "page", "size", and "public"
will have to go, despite being technical terms. Lets keep those on
Wordlist. This is not to say the two lists will not intersect though,
but more on that at a later time...

I tried as much as possible to make sure the various phases of this
process can be worked on in parallel. i.e. we dont need a complete
list of words before we start translating, and we can also allow
translators to start using this list before all translations are done.
So hopefully I will have specifications for the script soon and we can