[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Arabic OCR--Please



Salam

As mentioned earlier http://lists.arabeyes.org/archives/developer/2006/September/msg00013.html

It may be worthwhile and faster if Arabic support is implemented into Tesseract-ocr ..

The important thing is the support of unicode.. tesseract 2.0 http://code.google.com/p/tesseract-ocr/ can use and understand unicode and could be trained for any language that don't have its characters joined..

What it is lacking is mentioned in the training page :

Tesseract can only handle left-to-right languages. While you can get something out with a right-to-left language, the output file will be ordered as if the text were left-to-right. Top-to-bottom languages will currently be hopeless.

Tesseract is unlikely to be able to handle connected scripts like Arabic. It will take some specialized algorithms to handle this case, and right now it doesn't have them.

http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract

I did a very very simple test :
http://groups.google.com/group/tesseract-ocr/browse_thread/thread/b1b27838c68681ab

If you could help, please please do so.

Note:- As far as I know, right now..there is NO working Arabic-capable OCR engine.. free or otherwise.. I doubt if Sahkr software can detect anything.

--alnokta