[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to encode image produced by a recognition system?



Dear Nia Azniah,

On Thu, 21 Mar 2002, Chahine M. Hamila wrote:

> complex even. I am not aware of any arabic-related OCR technology, so a

Sakhr Software has an Arabic-script OCR program which should work for
Malay if your text is in a printed Arabic font.  If your text is still in
manuscript form you will probably just have to type it into a text file.
You might be able to use ArabTeX for Malay.  ArabTeX includes a lot of
characters for Arabic-script languages, but I'm not sure whether it has
all the characters needed for Malay.  Also you might try taking a look at
a Malaysian web site for Arabic: http://www.arlaco.cjb.net/ (Arabic
Language & Consultancy).  Malay hasn't been written in the Arabic script
in Malaysia since about 1957, so perhaps you could consider
transliterating your text.  Modern Malay can be written in ASCII, so that
would solve a lot of problems. :)

					Nicholas Heer

> Salaam,
>
> Nia Azniah a *crit :
>
> >    Salaam.. i'm new to this subject... anyway i need to solve a
> > post-processing phase for a Jawi ('Old Malay', origin from Arabic
> > script) manuscripts
> >    recognition system.. hope to get help from this list ...
> >
> Cool, looks like serious stuff;)
>
> >
> >    1. This particular recognition system produce binary images of
> > Jawi/arabic scripts.
> >
> More details please. What do you provide in input for example?
>
> >  How do we transform the particular image to match the codes given
> >    by UNICODE? (In other words, how do we make the result of a
> > recognition system to be accepted and used as UNICODE /UTF-8
> > encodings?)
> >
> I think I might need more details here too. But assuming you have an
> image file of an Arabic/Jawi letter and you want it transformed into a
> unicode encoded character, your best bet is to write :
> 1) a filter that would convert your binary image in a pixel array
> 2) write a neural network that you would train by feeding it arrays
> representing different letters
> If you are not dealing with images of seperate letters, it's a lot more
> complex even. I am not aware of any arabic-related OCR technology, so a
> 2-cents worth idea would be a similar training on neural networks for
> seperating letters, with a "parallel" dichotomic work trying to estimate
> letters width for example? Say, you take an arbitrary width for a start,
> you pass it through your nn, if your program thinks the result is not
> ok, it changes the width and redo it, etc?
> I suppose there should be better approaches for seperating letters
> though.
>
> >
> >    a) What is the best format to save the image?
> >
> Save an image of what?
>
> >
> >    b) Do i need to build databases for the images first then the codes?
> >
> In the case of character reckognition you need to build trained neural
> networks.
>
> >
> >    I don't have much ideas about it...please help
> >
> More details please for a suitable help:)
> Salaam,
> Chahine
>