[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PuTTY Bidi - final points



--- Owen Dunn <owend - chiark.greenend.org.uk> wrote:
> Nadim Shaikli <shaikli - yahoo.com> writes:
> > --- Owen Dunn <owend - chiark.greenend.org.uk> wrote:
> > > It would be good if you then preserved the word's visual length by
> > > sticking some tatweel in to make up the space there.  (Note for
> > > non-Arabic-speaking folk: tatweel is a meaningless mark that can be
> > > used to stretch Arabic words.)
> > 
> > Adding tatweel (or anything for that matter that is not part of the
> > document one is viewing) is not an option.  What needs to happen is
> > the visual buffer needs to be adjusted and shifted by one to account
> > for this "absorption" or "combining" - the question was really how
> > best to do this from an implementation point of view.
> 
> Wouldn't just removing the space break the alignment of the rest of
> the line?  If you're viewing tabular data, for example, this would be
> very important.  Under your model I have to introduce extra padding in
> the actual data (e.g. a file) to make things line up properly.

Alignment would not be broken since alignment and the initial creation
of the document/file would note a combined glyph visually from the
get-go (so spacing would be correct iff combing is actually done).
In other words, in order to see the above two glyphs back-to-back,
it visually needs to be combined and that will need to happen when
the person writes/creates the document as well as when he/she modifies
it or views it.  In short, its not an issue.

> > NOTE: this is how many other application (including mlterm) deal with
> >       it and it is the prim-n-proper method.  Adding our own "filler"
> >       characters is a recipe for disaster and file non-integrity.
> 
> Why would it have any effect at all on file integrity?  This would be
> purely a visual thing in the terminal emulator front end.

Because we'd be injecting glyphs where those glyphs don't exist in the
real file.  Adding anything visually or otherwise is not something that
anyone should do - a person would potentially see that tatweel and think
that it really is in the file (when its not).  In short, not something
we should muck with - we simply need to show the characters in their
proper visual format and adjust the line accordingly.

Note: these issues have all been talked about and solved, so what
      I (and others) are noting are simple regurgitations from what
      other applications do (ie. it is the accepted norm).

> However, if mlterm does it that way -- do other terminal emulators do
> it that way too? -- we might have to do it that way ourselves.  Does
> lam-alif occupy a single character cell in mlterm?  Does that mean
> that in an 80-column terminal window I can have eighty lam-alif
> characters on one line?

Yes you can - visually.  mlterm is unique in that it is the ONLY terminal
emulator that truly supports arabic.  Other terminals haven't caught up
just yet :-)

> > > Yes.  PuTTY doesn't currently support Unicode combining characters.
> > 
> > Ouch !!  I would guess we can simply ask for that "wishlist" item to
> > be elevated from a medium priority to maximum (unless someone is
> > willing to shed some light on what is needed to make it happen).
> 
> You can ask, but I can't guarantee any of us can make it happen soon.

Ahmad (do please CC the putty mailing-list on your replies) has noted
that he is willing to look into this given some much needed guidance.

> > A quick note, from what I've seen other applications do, the number
> > of composing characters that most allow is 2.  I believe that is the
> > maximum number that all languages use/require (so I'm unsure of the
> > statement I read on the link above that notes, "PuTTY should support
> > an arbitrary sequence of diacritics in any character cell").
> 
> PuTTY is not other terminal emulators.  Quite a lot of the time we
> value `working properly' much more highly than `just about working, as
> far as anyone will be likely to notice, probably'.
> 
> Two diacritical marks is not a maximum even in just the languages I
> know.  For example, in the Qur'an, Arabic can use a base letter,
> shadda, a vowel, and a Qur'anic annotation mark.  Greek can require a
> base letter and three diacritical marks.

That's news to me - vim for instance only allows for 2 (as do other
applications I've seen and used).

> > The topic of Bidi still looms large overhead.  Which code should we
> > use ?
> 
> I think ICU's licence is probably OK , but you will need to get an OK
> from Simon for that (and whether we're happy to embed that in PuTTY).
> The licence looks like MIT:
> 
> http://oss.software.ibm.com/cvs/icu/~checkout~/icu/license.html

Can we remove the "probably" from the above statement :-)  If ICU is
deemed acceptable we can start stripping down its code to simply
extract the Bidi functions, but again we need the proper OK to go
ahead.  Simon, can you please comment.

Regards,

 - Nadim


__________________________________
Do you Yahoo!?
Yahoo! Search - Find what you’re looking for faster
http://search.yahoo.com