[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Arabic OCR--Please



> Funny, I think me, linuxz and oomlx had a talk about it at #arabeyes.
Any fruits? :)
yes, but they made my stomach hurt.

>> but the math of it is just beyond me.
>Heh, really? you should consider expanding your horizons (or brains) :P :P
or take the course on Fourier Transformations
---------- Forwarded message ----------
From: Mohamed Magdy <mohamed dot m dot k at gmail dot com>
Date: Sep 26, 2007 12:27 AM
Subject: Re: Arabic OCR--Please
To: Development Discussions <developer at arabeyes dot org>

Afief Halumi wrote:
> Funny, I think me, linuxz and oomlx had a talk about it at #arabeyes.
Any fruits? :)
> There has been a paper published outlining some nice methods for
> arabic OCR,
That should help I assume.
> but the math of it is just beyond me.
>
Heh, really? you should consider expanding your horizons (or brains) :P :P
> here is the paper:
> http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-495.pdf
>
/me saves
> On 9/25/07, *Mohamed Magdy* < mohamed dot m dot k at gmail dot com
> <mailto:mohamed dot m dot k at gmail dot com> > wrote:
>
>     Salam
>
>     As mentioned earlier
>     http://lists.arabeyes.org/archives/developer/2006/September/msg00013.html
>
>     It may be worthwhile and faster if Arabic support is implemented into
>     Tesseract-ocr ..
>
>     The important thing is the support of unicode.. tesseract 2.0
>     http://code.google.com/p/tesseract-ocr/ can use and understand unicode
>     and could be trained for any language that don't have its characters
>     joined..
>
>     What it is lacking is mentioned in the training page :
>
>     > Tesseract can only handle left-to-right languages. While you can get
>     > something out with a right-to-left language, the output file will be
>     > ordered as if the text were left-to-right. Top-to-bottom languages
>     > will currently be hopeless.
>     >
>     > Tesseract is unlikely to be able to handle connected scripts like
>     > Arabic. It will take some specialized algorithms to handle this
>     case,
>     > and right now it doesn't have them.
>     >
>     http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract
>
>     I did a very very simple test :
>     http://groups.google.com/group/tesseract-ocr/browse_thread/thread/b1b27838c68681ab
>
>     If you could help, please please do so.
>
>     Note:- As far as I know, right now..there is NO working Arabic-capable
>     OCR engine.. free or otherwise.. I doubt if Sahkr software can detect
>     anything.
>
>     --alnokta
>

_______________________________________________
Developer mailing list
Developer at arabeyes dot org
http://lists.arabeyes.org/mailman/listinfo/developer