[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

FWD: Re: PuTTY : Patch to shape arabic letters correctly]



Somehoe didn't make it through the queue..

----- Forwarded message from Simon Tatham <anakin (at) pobox dot com> -----

From: Simon Tatham <anakin (at) pobox dot com>
Reply-To: putty (at) projects.tartarus dot org
To: Mohammed Elzubeir <elzubeir (at) arabeyes dot org>
Cc: kamal_dalal (at) yahoo.com, developer (at) arabeyes dot org, 
    putty (at) projects.tartarus dot org
Subject: Re: [putty]Re: PuTTY : Patch to shape arabic letters correctly

> On Mon, Jun 03, 2002 at 11:17:01PM -0700, Kamal Dalal wrote:
>> I am looking into patching PuTTY, and make it shape arabic letters
>> correctly. From what I see (using a Win2k Pro), a simple command like "cat
>> <utf-8 file>" displays all disconnected arabic letters. I heard from
>> Mohammad Elzubair that he has seen PuTTY display correct shapes after the
>> window refreshes itself. I could not see that behaviour. Can anybody on
>> the list confirm or deny ?

I think I can explain this phenomenon.

PuTTY does not know anything about right-to-left scripts. So if you
send a sequence of characters ABCDE, it places them in its model of
the terminal screen in the order ABCDE. If you sent them one by one,
then its display code will be called in five separate calls to
output first an A, then a B, then a C, a D and an E. So you will see
the characters in that order on the screen as well.

However, if you subsequently hide the PuTTY window and then make it
visible all at once, the display code will be called with the whole
string in one go - `ABCDE' - and at that point Windows's own Unicode
handling will notice that those characters are in a right-to-left
script and it will helpfully reverse them for you - so you will see
EDCBA at this point. However, if you then move the PuTTY cursor over
those letters they will be redrawn one by one in the original order
again, since PuTTY still _thinks_ they're on the screen in the order
`ABCDE'.

Clearly PuTTY needs to do something about this. At a minimum it must
become able to predict what the Windows text display call will do
with the characters it sends, and hence not have this discrepancy
between what it thinks it displayed and what it actually displayed.

Beyond that, the question of what to do about sending text in the
order ABCDE and having it entered into PuTTY's terminal handler in
the order EDCBA is a harder one to solve. There are of course
published algorithms and implementations of code which will take a
piece of mixed-language text and arrange it correctly, but I think
it will be somewhat harder to implement this in a terminal emulator
than in (say) a word processor, because it must be done _character
by character_. You don't just have to work out what happens if you
have English and Arabic text side by side on the same line of text;
you need to be able to _send_ that text to the terminal, one
character at a time, and know what happens to the screen as you send
each character - in a way that's implementable algorithmically and
simply, with a minimum of stored state.

I'm unaware of any existing specifications for how this should work.
As far as I know, the recent UTF-8 modifications to xterm don't even
attempt to solve this problem, but merely print characters left to
right in the order they're given - so that it's the job of `cat' or
an equivalent utility to do the bidi calculations and re-order the
characters before sending them to the terminal. It isn't clear to me
whether this is how things need to stay - some input from people who
actually use these characters would be welcome.

Cheers,
Simon
-- 
Simon Tatham         "A defensive weapon is one with my finger on the
<anakin (at) pobox dot com>    trigger. An offensive weapon is one with yours."