[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Siragi OCR project, uml design diagrams



Hi Ahmad and thanks for feedback,

Le Jeudi 18 Mai 2006 22:08, Ahmad Sayed a écrit :
> Hi Tarik,
> I'm not of those who believe in UML outside interviews and college exam but
> yours seems to be realistic

Purpose of UML diagrams is to be able to understand design of SIRAGI 
application without diving into source code. Secundo, I would like to have 
discussion like we have now on paper before implementing choices into code. 
It's easy to change a paper design than a source code. Last, If we decide 
further to implement the program in another platform, it would be easy to do 
that from a proper design than from the best program.

>  but i have only suggestion, when i think about
> my ocr if we will depend on neural networks it will be better to use 3
> neural network 2 for boundried and third for middle chars in order to make
> it simpler and easier to learn and maintain as we will not be fair to ask
> the neural network to out the same character for 3 different pattern like

I think for the classification program (here, the neural network), the three 
shapes of the same character are three different characters in input. But the 
neural network will output the same utf8 code for all the patterns 
corresponding to the same character. I already made some tests on 96 patterns 
in input of a neural network with only 28 codes as output.
> عـ ,ع  ,ـع
> without consider a new classifier layer to classify the state of the
> character which we could know directly from segmentation algorithm,

I think segmentation algorithm should make mark boundaries of characters but 
it doesn't know anything about characters. 

> Really i don't am i in the right level to discuss this issue

I think it's the right moment to discuss this issue. 

> another thing
> you speak about something called pixelization and vectorization  i doesn't
> make sense to me i expect it as we will not depend only on the x*y matrix
> of pixels  generated from segmentation module you will victories the char
> and use this for the neural network so as it doesn't clear in your sequence
> diagram do you mean
> that segment then pixelise the victories each character
> or you mean that you will have two parallel module one segment using
> pixelization and other using vectorization

The idea is to have an OCR with many strategies and engines to deal with OCR. 
We can use one  or a combination of them. One engine could be more efficient 
for a categorie of texts and another one for another type, etc.

In my mind we can either describe a character as a matrix of pixel or as a set 
of lines and curves. These are two different strategies. Before the 
segmentation program transmit the character to be recognized to the 
classification program (here the NN), there is two strategies :

- keep the character as pixels, but resize it to fit in matrix size accepted 
by the neural network 8 x 8 or 16 x 24, etc.

- transform pixels into a vector of lines and curves describing the 
characters. The neural network should be configured to have curves as input 
and not pixels.

I think we will first  implement pixel neural network. And in a second stage, 
we could implement and test a vector neural network. 

There is another idea that doesn't appear in the diagrams : learning. Most of 
OCR engine can learn from input to calibrate their algorithm. We should 
implement this feature in a next version.

Best regards

Tarik Fdil