[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: questions on the wordlist



--- sven vahar <aabram at gmail dot com> wrote:
> I've been considering taking the Arabeyes wordlist as a basis for an
> Estonian-Arabic wordlist. I'm currently maintaining  an online
> English-Estonian dictionary (http://www.tps.edu.ee/nastik/) and since
> I've started studying Arabic (I'm just a beginner) I've been looking
> for free Arabic wordlist for (semi-)automatic conversion. My goal is
> to build at least rudimentary Arabic-Estonian dictionary. Arabeyes
> wordlist seemed almost perfect at first but I've stumbled upon few
> issues that I'd like to ask clarification about.

Wow - this is pretty cool.  I'm just waiting and hoping for google to
pick-up this wordlist at some point post we get our act together in
forming a project to do auto sentence/paragraph translations.

> All English words have initial capital letter. Why this decision? The
> problem with this is that one cannot distinguish between proper names
> and similarily written regular words. For example - in English "Jenny"
> is a name while "jenny" is a female donkey. "John" is a name but
> "john" is a loo. While I can see that in Arabeyes wordlist both Jenny
> and John give just Arabic tansliteration of those names I wouldn't
> know this for other words. There are many other such words, for
> example Maxim: it could be the name, it could be an aphorism or it
> could be a machine gun. That's the reason why dictionaries use
> lowercase for regular words and initial capital letter for proper
> names. Is there an Arabeyes wordlist available which follows this
> fashion?

The capitalization was done out of shear ad-hoc'ishness (if that is
a word :-)  We couldn't find a proper wordlist to start with and had
to generate our own (all within the scope of an open license).  So
although what you note above is most likely true, I personally don't
view it (just yet) as critical.  If/when we come across a full-fledged
"proper" wordlist we ought to compare our english terms with theirs
for completeness and this time around make a distinction on caps vs.
non-caps.  As it stands now what is critical is to complete the
wordlist so that each term has multiple and "all" translations
(irrespective of capitalization).

> Second problem is that as a learner I find it confusing that most
> (all?) nouns are given with the definite article instead of just the
> base form. ����� (al bahr) instead of just ��� (bahr),
����� instead
> of just ��� (I hope I get right characters here). It is confusing for
> the beginner and a foreigner. For non-native speaker it is easier to
> look up the base word and add the article if needed, rather than look
> up the word with an article and then deduct to get the base form. I
> find using my small Al-Mawrid (2004, 6th ed) dictionary easier than
> the wordlist. What was the reasoning to include the articles with the
> words in Arabeyes wordlist?

I'll leave this to some of the more linguistically capable people on
this list (esp. those in the QAC team).  I'm unaware of what they use
in the Al-Mawrid so if there is URL that can shed some light that would
be wonderful.

Welcome aboard & Salam.

 - Nadim


		
__________________________________ 
Yahoo! Mail Mobile 
Take Yahoo! Mail with you! Check email on your mobile phone. 
http://mobile.yahoo.com/learn/mail