[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: yudit + bidi



On Mon, 3 Jun 2002, Nadim Shaikli wrote:
[...]
> > I tried to explain the reasons why Yudit is not using any
> > algorithms  given by  Unicode and it purely works on the
> > view instead of a back-store buffer.
> >
> > So how should I proceed?
> >
> > I could easily add a flag to each line about the initial
> > directionality, and add a mark to each combining cluster
> > to help reordering.
> >
> > Also the initial directionality somehow  would need to be
> > passed when cutting and pasting for smooth operation.
> >
> > I don't think we can change Unicode algorithms but there
> > must be a way to hack them.
>
> Well, if this is all within Yudit then any solution that you
> might think of might work (ie. in passing variables, etc), but
> one needs to be keep in mind is that Yudit should play well
> with other kids in the playfield :-)  In other words, if I
> wanted to cut-n-paste from my Arabic enabled terminal-emulator,
> Yudit should accept and display it correctly.  Another thing
> to note is, what's in the cut-buffer should not really be
> visual characters to begin with - and so this problem resolves
> to simply being the initial stored file issue.

Yes, Yudit should play well, so  cut & pasted text should be
also conformant. Still I expect some problems with initial
directionality when BiDi text is transported, unless initial
directionality is hacked. I would like to transfer the selected
text so that it will land in the pasted environment that same order
as it was. I wonder how that is possible or is it possible.

[...]

> After many hours, I'm now convinced that bidi is indeed not
> reversible as it stands now (all the heuristics I was able to
> come up with were not fool-proof -- some worked better than
> others, but all were breakable).
>
> The issue really amounts to this.  Assume f() is a transformation
> function (bidi in this case); assume A, B to be data points, then
>
>   f(A) = X
>   f(B) = Y
>
> given that X and Y are not equal one will be able to reverse,
> but there are cases (and its rather easy to generate too) where
> A != B which results in X == Y  -- ouch !!!
>
> Whether this is a security breach is debatable and I, again :-),
> don't want to go there since there is context involved.
>
> So now what ?  Well, I could think of one solution - include a hidden
> visual directives (those hidden directives will NOT show up in the
> file and would simply serve to note initial character start per line).
> Since this visual hidden char is not part of the unicode standard,
> yudit should implement it internally and should not affect any of its
> external interfaces (cut-n-paste should skip those hidden chars, they
> should not be stored, etc) until such time as the unicode folk accept
> such an idea (if ever).  This would serve Yudit's purposes in hacking
> reversibility.
>
> In short - bidi is not reversible, a hack could be worked out
> which could/would involve a visual hidden character (not transferable
> and not to be stored).
>
> For those not following what's being said - here's a statement
> of the problem.  The reversibility of Bidi involves taking a visual
> representation (ie. what's on screen) and without knowing anything
> about how it came about, regenerate the contents of the file on
> disk.
>
> Visual example (capitals are Arabic letters),
>
> ## Sample file contents
> MILK & cookies
> fish & CHIPS
>
> I run fribidi; the display output I'll get,
>
> +---------------screen width------------------+
>                                  cookies & KLIM
> fish & SPIHC
> +---------------------------------------------+
>
> if I note fribidi's l2v (logical-to-visual) and
> v2l (visual-to-logical) info using fribidi's
> "--verbose" flag, I see this,
>
> for the 'cookies' line its -
>   l2v: 13 12 11 10 9 8 7 0 1 2 3 4 5 6
>   v2l: 7 8 9 10 11 12 13 6 5 4 3 2 1 0
>
> for the 'fish' line its -
>   l2v: 0 1 2 3 4 5 6 11 10 9 8 7
>   v2l: 0 1 2 3 4 5 6 11 10 9 8 7
>
> So the "reversibility" question then becomes "how do I revert
> what is shown on screen to what is inside the stored file".
> And that is simple if the v2l was available, but in a true
> reversible system, that info would not be around.
>
> Simple, right ? Not quite - consider this scenario.
>
> ## Sample file contents (note equal spaces on both ends)
>              hello there FISH MILK
>              FISH MILK hello there
>
> again run fribidi; the display output you'll get,
>
> +---------------screen width------------------+
>              hello there KLIM HSIF
>              hello there KLIM HSIF
> +---------------------------------------------+
>
> :-) and there-in is the problem.  So now with no info on the
> file and no v2l data - how do I figure if the line started
> with an english word vs. if it started with an Arabic one ?
> The only way I could figure is to add a visual hidden character
> next to the first character of the line.  So I would end up with
>
> +---------------screen width------------------+
>             ~hello there KLIM HSIF
>              hello there KLIM HSIF~
> +---------------------------------------------+
>
> where the '~' is that hidden character (again ONLY in visual mode).
> I don't think this visual hidden char should be used outside of
> Yudit unless it becomes part of the unicode Bidi algorithm.  This
> guarantees interoperability between Yudit and other applications
> while fulfilling Yudit's reversibility requirement.  As for what
> happens when someone pastes into Yudit something, I'm guessing that
> pasted characters are captured in the order the user moves over
> the characters (ie. right-to-left vs. left-to-right) and as such
> Yudit could insert the visually hidden char prior to the first paste
> if it where on a new line (or else deal with it when it runs Bidi
> on the buffer which happens with every change).  Granted its easier
> said than done, but I think you get the general idea.
>
> I don't see the need for any other info (like for combining
> clusters, etc).  Given the info noted above, you can run a forced
> directional bidi on the line to get its correct initial state.

This is great. I always had a feeling that it is only the initial
directionality (alignment information?) that I need to  reverse the
BiDi algorithm.

So there is a need for a ordering function that returns the
ordered characters (or a position array) and a bit that can be
0 for left-aligned and 1 for right-aligned text.

 bool logical2display (uint31* array, unsigned int len);

Then the reverse reordering function would take the reordered characters
and this bit and  hopefully produce the original:

 void display2logical (uint31* array, unsigned int len, bool startdir);

The question remains -  is function 'display2logical' possible? I
think it is.

I will be pretty busy till middle of July. I will have some spare
time when travelling; I will think about the second algortihm.

I have a feeling that we are getting closer.

Cheers
Gaspar