Salaam,
On Thu, 21 Mar 2002, Chahine M. Hamila
wrote:
> I think I might need more details here too.
But assuming you have an
> image file of an Arabic/Jawi letter and you want it transformed into a > unicode encoded character, your best bet is to write : > 1) a filter that would convert your binary image in a pixel array > 2) write a neural network that you would train by feeding it arrays > representing different letters > If you are not dealing with images of seperate letters, it's a lot more > complex even. I am not aware of any arabic-related OCR technology, so a > 2-cents worth idea would be a similar training on neural networks for > seperating letters, with a "parallel" dichotomic work trying to estimate > letters width for example? Say, you take an arbitrary width for a start, > you pass it through your nn, if your program thinks the result is not > ok, it changes the width and redo it, etc? > I suppose there should be better approaches for seperating letters > though. > > More details please for a suitable help:) > Salaam, > Chahine > You had the point, chahine. Friends of mine have
already done all the OCR phases and segmented the manuscripts into separating
letters (in arrays). My job is to transform the arrays into encoded characters
so that the manuscripts can be saved in codes which may save the storage.Your
idea to train every letter is good and acceptable but it seems complex
enough though. Is there any other possible and less complex solution for
it? (Why previous arabic OCR researchers always left this subject
out of their reports?)
Anyway thanx Nicholas for trying to help. But if i
have to type the manuscripts into text file manually, then this project is
meaningless.. Another thing is ..for your information, the Malays still use the
'old malay'/Jawi script until now. Even their children are also being taught how
to read and write it in school but they just don't
use it widely in daily business.
TQ,
Nia Azniah
|