[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: start of SIRAGI project
- To: Development Discussions <developer at arabeyes dot org>
- Subject: Re: start of SIRAGI project
- From: Tarik FDIL <tfdil at sagma dot ma>
- Date: Fri, 8 Apr 2005 10:06:26 +0000
Salam,
If I summarize our discussion, here is the main points :
- tif files are not in their right place in SIRAGI, so they should be removed
from CVS.
- GOCR has a poor design, so it can not be adapted for supporting arabic. It's
not a bad idea to write another OCR from scratch.
- SIRAGI-OCR should not make the same mistake as GOCR : supporting one type of
script. Its design should be general and good to support Arabic and other
scripts (latin, etc.)
Thanks for these recommandations. I will update CVS later and give more
details in todo list.
Mete wrote:
> [...] Are you planning to base SIRAGI completely on top of
> neural networks? It seems like that would be the way to achieve a really
> efficient and flexible OCR.
There is two important parts in an OCR : line/character detection and
character classification. The former part is implemented in SIRAGI using a
finate automata. It is not an easy problem... The latter one will be coded
using a neural network. The nn is designed, trained and tested using Stutgart
Neural Network Simulator.
I think we should write at least two engines based on neural networks : the
first engine get pixels as input and the second get curves as input. When the
two engines agree on a character, the recognition is confirmed. Currently we
are working on the first engine.
Kind regards
Tarik
Le Vendredi 8 Avril 2005 08:48, Mohammed Elzubeir a écrit :
> Behdad Esfahbod wrote:
> > That's exactly the point. Just use libtiff, libpng, and libjpg
> > and you have supported all formats that matter. Having libtiff
> > files in your source code just makes it look bigger than it
> > really is.
>
> I cannot judge the developer here for why he made certain decisions but
> this seems to make sense to me (use pre-existing libraries and link to
> them instead).
>
> > [..]
> >
> > So, please design your's with multiscript support in mind.
>
> I think this is the crux of the matter. The mistake we often make (not
> always) is that we end up doing Arabic solutions, _because_ other
> applications are not i18n'ized enough. Yet, we do the same thing the
> original authors did -- stick to their own cultures and not think of the
> rest of the world.
>
> I don't know about SIRAGI itself, but I can say that in the case of
> Duali, the most likely conclusion for it _will_ be integration into
> other mainstream applications.
>
> Behdad, need I remind you that it took OVER A YEAR for VIM's author to
> incorporate the Arabic patches? We can't just sit around waiting for
> mainstream developers to integrate our work and neither do we want to
> start forks unless we feel that the maintainers have no intentions of
> doing anything about it. After all, the chances of the success of a fork
> is mostly not very high.
>
> Enough said, let's get back to work. Tarik ,if you can please expand on
> your TODO list so others who are interested can pitch in work, it would
> be great.
>
> Regards,
> Mohammed Elzubeir
--
Tarik FDIL
Service informatique
SAGMA
GSM (212) 61 14 34 49
Tél. Bureau (212) 22 44 07 17, (212) 22 31 64 69
Fax : (212) 22 44 63 52
Web : professionnel : http://www.sagma.ma
personnel : http://www.sagma.ma/tarik/
paternel : http://www.sagma.ma/oapam-casa/
http://siragi.sourceforge.net