[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Fwd: Arabic Spellchecker for OpenOffice
- To: "Mohammed Sameer" <msameer at foolab dot org>
- Subject: Re: Fwd: Arabic Spellchecker for OpenOffice
- From: Jabs <jabrafghneim at gmail dot com>
- Date: Fri, 24 Mar 2006 09:42:59 -0700
- Cc: ahmad at gharbeia dot org, amr at gharbeia dot net, Development Discussions <developer at arabeyes dot org>, alaa at manalaa dot net
- Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references; b=hQP5v7pgCjfrjb0ujuZrRgJQn6ZOWmauZWPGCyI09VbzQEIpapOSRffbIWntQbkX1ASguITGQfc3j8nj/7RBC0Qen7WVEBSdwYNFneQpVyE6DqcZbgr93jGPbGIDK/anf1sJamoWHyqUt9Ocf0/vUk3by9jpQKna+5Bo40/ThZI=
OK, here is a second email because usually in my first reading I skim tyhrough emails and it is not a deep enough reading. Here is my follow-up.
I agree with Mohammad that word lists have no copyright issues. Words are words and nobody has a copyright on a language's words. I have several lists that I collected over the years, but I have recently had the time to write a Perl script that breaks a text into individual words and then dedupes them to make sure every word is unique. Since then I ran this program on millions of words and it works 99% perfect. There is some rubbish but it is easy to spot and clean manually. I will send you all a preliminary list this next Monday. If it is ok I want to be the filter for updating such a list if that is ok. I believe it will be easier this way. Let me know what your thoughts are.
Jabra
On 3/24/06, Mohammed Sameer <msameer at foolab dot org> wrote:
Hi,
For the 1st time since a long time I'll pop in.
Please guys, The plain word list approach will work. I was objecting to the affix
thing not because it's bad, It's because it'll not be completed thus it'll delay
and stall the whole process "and it happened as I said".
Let's go with the word list approach and then later the affix can be done
Check this: ftp://foolab.org/pub/software/aspell/aspell-quran.tgz
That's a plain word list generated from the words of the holy Quran without the
affix approach and it works fine
Please jobs, If you have such a list and you know that the spelling of its words is
correct, Please release it, I'm welling to maintain it and add missing words as long
as I'm able to.
Flame me for the above words, Do anything but please release it if you can.
Regarding the copyright: You can't copyright a word list, You can only copyright the
representation of words in the list. Alaa, Please correct me if I'm wrong.
PS. You might like to read this to understand the different stages I passed by!
http://www.foolab.org/node/1439
http://www.foolab.org/node/1482
Best regards,
--
GNU/Linux registered user #224950
Proud Egyptian GNU/Linux User Group <www.eglug.org> Admin.
Life powered by Debian, Homepage:
www.foolab.org
--
Don't send me any attachment in Micro$oft (.DOC, .PPT) format please
Read http://www.gnu.org/philosophy/no-word-attachments.html
Preferable attachments: .PDF, .HTML, .TXT
Thanx for adding this text to Your signature
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
iD8DBQFEJANyy2aOKaP9DfcRAiT8AKCY3pvRWTIzPlEQazvmCQ0BkKxLzACfZyRy
Zb3v3FMp7tg4IUxnujkx6SY=
=psF4
-----END PGP SIGNATURE-----
--
Fortuna ventus mens ut est paratus.
************************************************************************************
Language Hacker. Creator of Localized or translated culturally aware services, content, technology, training and experiences targeted to individuals and professional services firms with projects related to the Middle East region.