[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Arabic OCR--Please
- To: developer at arabeyes dot org
- Subject: Arabic OCR--Please
- From: Mohamed Magdy <mohamed dot m dot k at gmail dot com>
- Date: Tue, 25 Sep 2007 18:52:58 +0300
- User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.4) Gecko/20070509 SeaMonkey/1.1.2
Salam
As mentioned earlier
http://lists.arabeyes.org/archives/developer/2006/September/msg00013.html
It may be worthwhile and faster if Arabic support is implemented into
Tesseract-ocr ..
The important thing is the support of unicode.. tesseract 2.0
http://code.google.com/p/tesseract-ocr/ can use and understand unicode
and could be trained for any language that don't have its characters
joined..
What it is lacking is mentioned in the training page :
Tesseract can only handle left-to-right languages. While you can get
something out with a right-to-left language, the output file will be
ordered as if the text were left-to-right. Top-to-bottom languages
will currently be hopeless.
Tesseract is unlikely to be able to handle connected scripts like
Arabic. It will take some specialized algorithms to handle this case,
and right now it doesn't have them.
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract
I did a very very simple test :
http://groups.google.com/group/tesseract-ocr/browse_thread/thread/b1b27838c68681ab
If you could help, please please do so.
Note:- As far as I know, right now..there is NO working Arabic-capable
OCR engine.. free or otherwise.. I doubt if Sahkr software can detect
anything.
--alnokta