[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Arabic OCR--Please

To: developer at arabeyes dot org
Subject: Arabic OCR--Please
From: Mohamed Magdy <mohamed dot m dot k at gmail dot com>
Date: Tue, 25 Sep 2007 18:52:58 +0300
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.4) Gecko/20070509 SeaMonkey/1.1.2

Salam

As mentioned earlierhttp://lists.arabeyes.org/archives/developer/2006/September/msg00013.html

It may be worthwhile and faster if Arabic support is implemented intoTesseract-ocr ..

The important thing is the support of unicode.. tesseract 2.0http://code.google.com/p/tesseract-ocr/ can use and understand unicodeand could be trained for any language that don't have its charactersjoined..


What it is lacking is mentioned in the training page :

Tesseract can only handle left-to-right languages. While you can getsomething out with a right-to-left language, the output file will beordered as if the text were left-to-right. Top-to-bottom languageswill currently be hopeless.
Tesseract is unlikely to be able to handle connected scripts likeArabic. It will take some specialized algorithms to handle this case,and right now it doesn't have them.

http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract

I did a very very simple test :
http://groups.google.com/group/tesseract-ocr/browse_thread/thread/b1b27838c68681ab

If you could help, please please do so.

Note:- As far as I know, right now..there is NO working Arabic-capableOCR engine.. free or otherwise.. I doubt if Sahkr software can detectanything.


--alnokta

Follow-Ups:
- Re: Arabic OCR--Please
  - From: Afief Halumi
- Re: Arabic OCR--Please
  - From: Ahmed El-Mahmoudy

Prev by Date: Re: RFS: acon & thwab-lib
Next by Date: Re: Arabic OCR--Please
Previous by thread: Arabic shaping support missing in PyFribidi2
Next by thread: Re: Arabic OCR--Please
Index(es):
- Date
- Thread