On Wed, Apr 19, 2006 at 12:50:03PM +0200, Michele Barontini wrote: > Alle 01:30, mercoledì 19 aprile 2006, Mohammed Sameer ha scritto: > > On Thu, Apr 06, 2006 at 05:58:31PM +0200, Michele Barontini wrote: > > > Alle 01:34, mercoledì 5 aprile 2006, Mohammed Sameer ha scritto: > > > > Hi all, > > > > > > > > Let me start first with a screenshot: > > > > http://www.eglug.org/arabic_spell_for_openoffice > > > > > > > > I can say that we currently have a tiny wordlist with about 71,000 > > > > words. It has been generated from various sources, > > > > > > > > No affix/infix or anything yet, It can be done later. > > > > > > > > For aspell: > > > > ftp://foolab.org/pub/software/arspell/20060329/aspell-ar-20060329.tar.b > > > >z2 for OpenOffice: ftp://foolab.org/pub/software/arspell/20060329/ar.zip > > > > > > > > aspell-ar debs for etch: > > > > http://home.foolab.org/debs/aspell-ar/20060329/1/ > > > > > > > > I've also submitted an ITP for debian but I'm looking for a sponsor. > > > > > > > > probably it contains some spelling mistakes and we need to extend it. > > > > > > > > Any ideas ? thoughts ? "other than converting it NOW to affix/infix" > > > > > > Afarim Mohammed > > > > > > Many thanks for your job. I would like to contribute to the extension of > > > the aspell dict, but, what sort of discipline do you propose? Concentrate > > > on litterary texts? On the current arabic of the press? The technical > > > jargon(s)? Distribute the tasks between different people? Product a > > > document of advices (linguistics and tech:what to record, what files to > > > send back, etc.) for contributors? > > > > Michele, > > I'm really sorry for the delay. > > I have no connectivity at home these days and no time for personal things > > during the office hours. > > > > Well, What I really need ATM is: > > * People to review > > * Ideas to extend the list as we are still missing a lot of common words! > > > > I try to put any accurately reasonable source I find in but looks like this > > is not enough as I did add what I have and no idea from where can we get > > Arabic text. > > > > I understand I can use a spider to harvest some Arabic websites and include > > the words But I don't guarantee the spelling correctness which will make > > the life of volunteers harder > > > > What do you think ? > > Mohammad, > I'll be absent up to the end of the week, but I have a couple of ideas > to submit to you, before: Well, I'll be out of town until the end of next week! > since a major problem is caused from the different styles of writing I would > work to TWO different dictionaries the first originated from classical texts > and the second by modern standard texts (with the less diacritics as > possible) Well, I'm not really going to work on the traditional one, If you use dictionaries then probably you don't need a spell checker. Of course you do but I'm not targetting those people honestly. The problem will arise if yo are learning Arabic but honestly, I don't feel like doing it, If I can't find people to help review the modern dictionary then I'll even suffer more finding people to review words with dictionaries. Of course if we have a way to automate adding the dictionaries it'll be fine I can implement such an application if we have _concrete rules_ but I doubt. The word list I'm working on doesn't contain dictionaries at all. > as to the first the mohaddith.org (or mohaddath) site has worked for years to > a project of turath texts collected and searchable by their software "had" > for windows and should be kept in mind Perhaps it's muhaddith.org ? Looks like they encode it in a strange way, The first few lines are proper cp1256 encoding but the rest is some sort of binary data. > as to the second www.avu_dam.org has an impressive collection of critical (see > at 'dirasat') and litterary texts; Can't resolve that, Are you sure about the URL ? > as for the technical fields (to attach to > the second list) the problem becomes really complicated but it should be > controlled at least the site of the maroccan based istitution who edits the > review al Lissan al 'Arabi in which a consultable dictionary has beeen set, > and quoted in some letters posted in the developers' list. Many have noticed > that this site is buggy. I would insist that such an institution should be > interested to open as much as possible the public access their own > contributions. With all my respect to them, I'm not going to contact them and beg for something I know there's a 1% possibility they might give away. As it's not a translation dictionary, I don't really think it's a big problem. > then the lists could be used separately or merged into one. > > Excuse me for the little accuracy of my informations for the moment. At least you did something, Thanks a million -- GNU/Linux registered user #224950 Proud Egyptian GNU/Linux User Group <www.eglug.org> Admin. Life powered by Debian, Homepage: www.foolab.org -- Don't send me any attachment in Micro$oft (.DOC, .PPT) format please Read http://www.gnu.org/philosophy/no-word-attachments.html Preferable attachments: .PDF, .HTML, .TXT Thanx for adding this text to Your signature
Attachment:
signature.asc
Description: Digital signature