[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[gsinai@yudit.org: Re: Yudit Arabic support]



This is a message from Gasper Sinai (author of Yudit) on Arabic support. I
thought some of you may be interested.

----- Forwarded message from Gaspar Sinai <gsinai at yudit dot org> -----

Date: Tue, 4 Dec 2001 11:22:11 +0900 (JST)
From: Gaspar Sinai <gsinai at yudit dot org>
X-X-Sender: gsinai at suse.blue-edge-tech.com
To: Mohammed Elzubeir <elzubeir at fakkir dot net>
Subject: Re: Yudit Arabic support
In-Reply-To: <20011203143456 dot A26868 at fakkir dot net>

Hi

> > Still I would like to push Arabic shaping to the extent that can be
> > achieved with glyph buffer - and I think it is not little. Please
> > bang on beta7. If native users find it useful I will leave it in
>
> I just compiled/installed it. Arabic shaping is not 'really' there. There are
> a few shapings, but for the most part, not really. There are several problems:
>
> 1. The keymapping is non-standard. Latin characters are mapped to Arabic
> equivalents (e.g. the letter REH is mapped to 'R'.) But that's not too hard to
> fix -- as soon as I have some time I will send you a new keymap.

I think keyboard map fixes are the easy part.

>
> 2. There are latin characters that appear before the Arabic ones come up. You
> only see the Arabic when you press another key (character or white-space). Is
> this intentional?

This is intentional. When I change STextBuffer to character based
buffer this behaviour  will disapplear for Arabic - stay for Hangul.

> 3. I changed the direction, and opened a file, but it still read it LTR, and
> had to high-light the text and set the direction again (ie. it didn't stick).

I would like to have your text - could you attach an utf-8 to this
email?

> 4. Input is still from LTR and not RTL (is this intentional?)

Yudit does not automatically change direction depending on
characters you pressed. At least not yet.

> > with 'experimental' status till a redesign of character buffer is done,
> > but if it is completely useless I will remove it. (baseically
> > removing /usr/share/yudit/data/shape.my gets rid of shaping)
>
> Hrmm.. I wouldn't call it 'useless', but I certainly wouldn't say it's
> 'usable'.
>
> > I did not get any attachment :(.

I think I might just trash it then. It was a good exercise. You dont know
how many times I attacked this proglem on a glyph based buffer to realize
that I need a character based buffer.

> I must have hit 'y' before attaching it. I attached it now, along with a
> screenshot from vim. Some of our development team members have been working on
> Arabic support for VIM, thus the interest in Yudit too ;) We have gotten
> pretty close to completing it, but with a few problems. For example, in the
> screenshot (of the same file in Yudit and VIM) the screen update reads the
> text LTR so the shaping at the beginning of the words are all wrong (it's as
> if they think there is a character before them they need to connect to).
> Otherwise, the remainder of the shaping is correct. All it says is my name,
> "Mohammed Elzubeir" (mhmd alzbyr -- trasliteration).

That's great. I wish I had other people working on yudit as well...
All alone it is pretty hard.

> > - put unicode truetype fonts in
> >   /usr/share/yudit/fonts or ~/.yudit/fonts
> >   (cyberbit.ttf times.ttf arial.ttf)
>
> I grabbed some ttf fonts and stuck em in there (no cyberbit.ttf though). Cool
> (I like the fact that it's indepent of X11).
>
> > Please give me some feedback on usability - It is very
> > difficult for me as I dont speak the language...
>
> Of course ;)
>
> btw, I have never used Yudit before. I'm a vi user, but.. looks pretty
> interesting. I like the console!

I like the console to and I use vi most of the time. I just wonder how
they made a console version of vi - I thought that is impossible for
unicode - at least efficiently.

Yudit was created not to compete with vi but to provide an alternative to
those who just want to sit down and immediatelly type something. My
wife for instance likes it a lot. It is very easy to change input
method and it is easy to use.

I have sme new ideas. I will need people to implement it. The current
code-base was written by myself, but I would like to make these
enhancements with other people - if I find any :(.

1. Character Based buffer.
--------------------------
Testing arabic with yudit showed that there  is a need for a
character-based buffer. Currently I use a glyph based buffer
and that is very good when it comes to user interaction - you type
one character and see one glyph, but it is very bad for
languages where the glyph count changes during editing. Having a
character based buffer would allow sub-glyph editing too, and
it would not have much affect on undo functionality. Currently
a lot of things in Ararbic shaping has to be done differently
than waht the standard says, because undo wont work if glyphs
are suddenly born at unexpedted places.
Files need to be modified:
 STextData.h STextData.cpp - complete rewrite,
   - use memory mapped data file.
   - have line-formatting and other fromatting  marks
     in a parallel  data structure. Make a smart algoritm to
     detach lines from textbuffer if line is modified. This would
     make data access extra fast.
   - join characters to form glyphs on demand, the class
     that usues STextData should query glyphs only for
     the visible portion of the screen. This would reduce
     memory usage.
 STextView slight modifications:
   - dont query whole textdata. Make estimates on line sizes on
     the non-visible portion of the screen to calculate
     preferredSize. (needed for scrolling)
 STextEdit slight modification
   - change cursor shape if cursor is not on glyph boundary.
I think unless these changes are done arabic will always have
some strange behaviour.

2. Redesign-mapping
-------------------
 Unicode requires a lot of mapping between characters. BUmap
 is very old and it existed before anything else. It was written
 so that is universal, and I spent more time on it than on the
 whole program altogether.  I think this map should be re-designed.
 Speed migh not be an issue especially if you read point 3.

3. Use unicode as export and import format only
-----------------------------------------------
 Not using unicode internally would have very good affect on
 any application. I am thinking about using unicode for compatibility
 reasons only. The format I want to use internally should have
 the following attributes:
 - character type determined from character code with a bit mask.
   This  would make it possible for application to avoid loading
   huge data strucutres. It would even make things possible like
   ameking a non-spacing mark as spacing-mark with a bit-mask.
 - character shaping determined by a very clear and short calculation.
 - there is one way to represent one character.
 I am constantly added things to these attributes. The goal is to
 make _only_conversion_routines_ aware of the huge database unicode
 applications currently holding, leaving the application with
 the task to perform their duties in a nice and elegant way. When
 a new character is introduced the character should automatically
 collate/shape e.t.c.

I think I will proceed (as my time allows me) from 1 to 3.
I would like to release soem more of 2.x series. I would like
to push glyphbuffer  to its limits, but Ararbic may not work
with the current design.

Please tell me what you think.

Thanks
gaspar



----- End forwarded message -----

-- 
-------------------------------------------------------
| Mohammed Elzubeir    | Visit us at:                 |
|                      |  http://www.arabeyes.org/    |
| Arabeyes Project     | Homepage:                    |
| Unix the 'right' way |  http://fakkir.net/~elzubeir/|
-------------------------------------------------------