[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Arabic wordlist for aspell is available.



On Wed, Apr 19, 2006 at 12:50:03PM +0200, Michele Barontini wrote:
> Alle 01:30, mercoledì 19 aprile 2006, Mohammed Sameer ha scritto:
> > On Thu, Apr 06, 2006 at 05:58:31PM +0200, Michele Barontini wrote:
> > > Alle 01:34, mercoledì 5 aprile 2006, Mohammed Sameer ha scritto:
> > > > Hi all,
> > > >
> > > > Let me start first with a screenshot:
> > > > http://www.eglug.org/arabic_spell_for_openoffice
> > > >
> > > > I can say that we currently have a tiny wordlist with about 71,000
> > > > words. It has been generated from various sources,
> > > >
> > > > No affix/infix or anything yet, It can be done later.
> > > >
> > > > For aspell:
> > > > ftp://foolab.org/pub/software/arspell/20060329/aspell-ar-20060329.tar.b
> > > >z2 for OpenOffice: ftp://foolab.org/pub/software/arspell/20060329/ar.zip
> > > >
> > > > aspell-ar debs for etch:
> > > > http://home.foolab.org/debs/aspell-ar/20060329/1/
> > > >
> > > > I've also submitted an ITP for debian but I'm looking for a sponsor.
> > > >
> > > > probably it contains some spelling mistakes and we need to extend it.
> > > >
> > > > Any ideas ? thoughts ? "other than converting it NOW to affix/infix"
> > >
> > > Afarim Mohammed
> > >
> > > Many thanks for your job. I would like to contribute to the extension of
> > > the aspell dict, but, what sort of discipline do you propose? Concentrate
> > > on litterary texts? On the current arabic of the press? The technical
> > > jargon(s)? Distribute the tasks between different people? Product a
> > > document of advices (linguistics and tech:what to record, what files to
> > > send back, etc.) for contributors?
> >
> > Michele,
> > I'm really sorry for the delay.
> > I have no connectivity at home these days and no time for personal things
> > during the office hours.
> >
> > Well, What I really need ATM is:
> > * People to review
> > * Ideas to extend the list as we are still missing a lot of common words!
> >
> > I try to put any accurately reasonable source I find in but looks like this
> > is not enough as I did add what I have and no idea from where can we get
> > Arabic text.
> >
> > I understand I can use a spider to harvest some Arabic websites and include
> > the words But I don't guarantee the spelling correctness which will make
> > the life of volunteers harder
> >
> > What do you think ?
> 
> Mohammad,
> I'll be absent up to the end of the week, but I have a couple of ideas
> to submit to you, before:

Well, I'll be out of town until the end of next week!

> since a major problem is caused from the different styles of writing I would
> work to TWO different dictionaries the first originated from classical texts 
> and the second by modern standard texts (with the less diacritics as 
> possible) 

Well, I'm not really going to work on the traditional one, If you use dictionaries
then probably you don't need a spell checker.
Of course you do but I'm not targetting those people honestly.

The problem will arise if yo are learning Arabic but honestly, I don't feel like
doing it, If I can't find people to help review the modern dictionary then I'll
even suffer more finding people to review words with dictionaries.

Of course if we have a way to automate adding the dictionaries it'll be fine
I can implement such an application if we have _concrete rules_ but I doubt.

The word list I'm working on doesn't contain dictionaries at all.

> as to the first the mohaddith.org (or mohaddath) site has worked for years to 
> a project of turath texts collected and searchable by their software "had" 
> for windows and should be kept in mind

Perhaps it's muhaddith.org ?

Looks like they encode it in a strange way, The first few lines are proper 
cp1256 encoding but the rest is some sort of binary data.


> as to the second www.avu_dam.org has an impressive collection of critical (see 
> at 'dirasat') and litterary texts;

Can't resolve that, Are you sure about the URL ?

> as for the technical fields (to attach to 
> the second list) the problem becomes really complicated but it should be 
> controlled at least the site of the maroccan based istitution who edits the 
> review al Lissan al 'Arabi in which a consultable dictionary has beeen set, 
> and quoted in some letters posted in the developers' list. Many have noticed 
> that this site is buggy. I would insist that such an institution should be 
> interested to open as much as possible the public access their own 
> contributions.

With all my respect to them, I'm not going to contact them and beg for something
I know there's a 1% possibility they might give away.

As it's not a translation dictionary, I don't really think it's a big problem.

> then the lists could be used separately or merged into one.
>
> Excuse me for the little accuracy of my informations for the moment.

At least you did something, Thanks a million

-- 
GNU/Linux registered user #224950
Proud Egyptian GNU/Linux User Group <www.eglug.org> Admin.
Life powered by Debian, Homepage: www.foolab.org
--
Don't send me any attachment in Micro$oft (.DOC, .PPT) format please
Read http://www.gnu.org/philosophy/no-word-attachments.html
Preferable attachments: .PDF, .HTML, .TXT
Thanx for adding this text to Your signature

Attachment: signature.asc
Description: Digital signature