[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Siragi OCR project, uml design diagrams
- To: Development Discussions <developer at arabeyes dot org>
- Subject: Re: Siragi OCR project, uml design diagrams
- From: Tarik FDIL <tfdil at sagma dot ma>
- Date: Fri, 19 May 2006 17:59:49 +0000
Hi Ahmad and thanks for feedback,
Le Jeudi 18 Mai 2006 22:08, Ahmad Sayed a écrit :
> Hi Tarik,
> I'm not of those who believe in UML outside interviews and college exam but
> yours seems to be realistic
Purpose of UML diagrams is to be able to understand design of SIRAGI
application without diving into source code. Secundo, I would like to have
discussion like we have now on paper before implementing choices into code.
It's easy to change a paper design than a source code. Last, If we decide
further to implement the program in another platform, it would be easy to do
that from a proper design than from the best program.
> but i have only suggestion, when i think about
> my ocr if we will depend on neural networks it will be better to use 3
> neural network 2 for boundried and third for middle chars in order to make
> it simpler and easier to learn and maintain as we will not be fair to ask
> the neural network to out the same character for 3 different pattern like
I think for the classification program (here, the neural network), the three
shapes of the same character are three different characters in input. But the
neural network will output the same utf8 code for all the patterns
corresponding to the same character. I already made some tests on 96 patterns
in input of a neural network with only 28 codes as output.
> عـ ,ع ,ـع
> without consider a new classifier layer to classify the state of the
> character which we could know directly from segmentation algorithm,
I think segmentation algorithm should make mark boundaries of characters but
it doesn't know anything about characters.
> Really i don't am i in the right level to discuss this issue
I think it's the right moment to discuss this issue.
> another thing
> you speak about something called pixelization and vectorization i doesn't
> make sense to me i expect it as we will not depend only on the x*y matrix
> of pixels generated from segmentation module you will victories the char
> and use this for the neural network so as it doesn't clear in your sequence
> diagram do you mean
> that segment then pixelise the victories each character
> or you mean that you will have two parallel module one segment using
> pixelization and other using vectorization
The idea is to have an OCR with many strategies and engines to deal with OCR.
We can use one or a combination of them. One engine could be more efficient
for a categorie of texts and another one for another type, etc.
In my mind we can either describe a character as a matrix of pixel or as a set
of lines and curves. These are two different strategies. Before the
segmentation program transmit the character to be recognized to the
classification program (here the NN), there is two strategies :
- keep the character as pixels, but resize it to fit in matrix size accepted
by the neural network 8 x 8 or 16 x 24, etc.
- transform pixels into a vector of lines and curves describing the
characters. The neural network should be configured to have curves as input
and not pixels.
I think we will first implement pixel neural network. And in a second stage,
we could implement and test a vector neural network.
There is another idea that doesn't appear in the diagrams : learning. Most of
OCR engine can learn from input to calibrate their algorithm. We should
implement this feature in a next version.
Best regards
Tarik Fdil