[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Arabic + Unicode BiDi



Hi,

Gaspar Sinai a *crit :

> (I think reverse-bi-di is
> impossible task accoruding to unicode specs - and yudit needed it).

No, reverse bidi is impossible under the unicode specs, which is a pity, and
most probably an omission, because it is *required* in many cases if one
wants to read Arabic text coming from sources that can not possibly edit
bidirectional texts, and even in cases where it is not simply absolutely
necessary, it can really make many programmers' lives easier when arabizing
some english soft...

> Please read solution used by yudit in Yudit.bidi.txt (attached)
> which is a  very  superfluous thing, but hopefully it explains
> what yudit  does for bidi.

I will try to dedicate a bit of time in order to read it, it can be very
interesting. We've had some proposals here in order to fix the problem too,
though we consider them to be outdated since reverse bidi exists in practice
in at least one major library;

> Please tell me if there is an existing reverse-bidi algorithm

The ICU library, made by IBM on top of Xerces does it. I have no idea what
it's worth, but it has at least the merit of having already implemented it.
My opinion is that it must be studied, have any flaws in it fixed, formally
specified reverse-bidi the way unicode bidi is, and implemented in other
libraries like fribidi.
ICU has many advantages as far as I know, but aside from being a really
heavy-weight library, it has a major drawback imo. The fact it is based on
Xerces makes its use in C/C++ pretty much difficult, in that sense that the
default dealing with characters is based on the UCS-2 (if I understand things
right, there are extensions in order to change this, but I don't know enough
there to talk about it and it hasn't appeared to be straight doing at first
sight to me). The problem with UCS-2 is that it is not natively supported in
C/C++, so literal constants are impossible to deal with (in a non masochist
human way that is) and type conversions are needed for every small
manipulation, making the programmer's life seem like hell and inducing really
unnecessary algorithmic costs. (A note aside and out of topic: imo the choice
of UCS-2 was made because MS compilers and Java use it, and the folks who
programmed Xerces seem to have never used anything but Java, just take a look
at how they made the DOM implementation and how they managed memory even in
the new IDOM version... anyway, this is out of topic).

> or
> where I could discuss this problem.

here, developer at arabeyes dot org is the perfect place:) Besides, the topic is not
new here:)

> If there is someone who could
> help with programming I just need one signgle class:
> stoolkit/sencoder/SBiDi.cpp
> to interfcae the rest of the world.

Take a look at ICU then, who knows? (I don't know how heavy is Yudit but ICU
can add a few pounds).
http://oss.software.ibm.com/icu/

Regards,
Chahine