[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Arabic OCR--Please



Afief Halumi wrote:
Funny, I think me, linuxz and oomlx had a talk about it at #arabeyes.
Any fruits? :)
There has been a paper published outlining some nice methods for arabic OCR,
That should help I assume.
but the math of it is just beyond me.

Heh, really? you should consider expanding your horizons (or brains) :P :P
here is the paper:
http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-495.pdf

/me saves
On 9/25/07, *Mohamed Magdy* <mohamed dot m dot k at gmail dot com <mailto:mohamed dot m dot k at gmail dot com> > wrote:

    Salam

    As mentioned earlier
    http://lists.arabeyes.org/archives/developer/2006/September/msg00013.html

    It may be worthwhile and faster if Arabic support is implemented into
    Tesseract-ocr ..

    The important thing is the support of unicode.. tesseract 2.0
    http://code.google.com/p/tesseract-ocr/ can use and understand unicode
    and could be trained for any language that don't have its characters
    joined..

    What it is lacking is mentioned in the training page :

    > Tesseract can only handle left-to-right languages. While you can get
    > something out with a right-to-left language, the output file will be
    > ordered as if the text were left-to-right. Top-to-bottom languages
    > will currently be hopeless.
    >
    > Tesseract is unlikely to be able to handle connected scripts like
    > Arabic. It will take some specialized algorithms to handle this
    case,
    > and right now it doesn't have them.
    >
    http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract

    I did a very very simple test :
    http://groups.google.com/group/tesseract-ocr/browse_thread/thread/b1b27838c68681ab

    If you could help, please please do so.

    Note:- As far as I know, right now..there is NO working Arabic-capable
    OCR engine.. free or otherwise.. I doubt if Sahkr software can detect
    anything.

    --alnokta