[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

farsi. farsi! farsi? farsi: (fwd)



FARSI.
Yes.
;-)

Disclaimer 1: This is not a Persian vs Farsi war message.
Disclaimer 2: CC to FarsiWeb list is just informational.
Disclaimer 3: The attached code is not in Public Domain.
Disclaimer 4: This is a long boring message.  Your own risk.

Note: No attachment.  I have put 100kb tarball at:
http://www.cs.toronto.edu/~behdad/farsistuff.tar.gz
It would hang there for a couple of weeks.



Two long years ago, is such a day that today is, perhaps in the
same wee hours in the morning but in Tehran time, I have been
polishing and wrapping up some piece of code that is has been
called "farsi" since then.

The story still goes more back.  Should have been in late 2000
that Roozbeh Pournader wrote some C code to convert Unicode
Persian text to some legacy character set called iransystem.  As
a requirement for that, he wrote the joining code that was later
used by me in "farsi".

Late 2001, my major work on FriBidi has been done, so was the
time to use what I have been doing.  Took Roozbeh's code, cleaned
up, plugged FriBidi, and it was what you get as farsi/fjoining/.
I wrote some more code to fill the gap in console to handle
harakats, and called it farsi/fconsole, and finally grabbed
source code from script(1), hacked a few lines, and called it
farsi/fcon.  With the helpf of font tools I borrowed from another
project and keyboard driver I wrote down, I had finally done my
pet called "farsi" that was doing me more than Akka was able to
do (for me as a Persian).

Since then the code got some clean up and some features added,
but nothing else changed, even the user base itself that was
limited to me, myself, and behdad.  The package named "farsi" was
still waiting for me (and Roozbeh) to resolve the copyright
status and get released, while I lost my interest in bidi console
and it wend down into my 10GB archive of last (lost) files.

Fortunately I did three small releases of the code, first on a
local list called 'farsidev' that does not exist today anymore
(and I cannot remember even.  Just wrote it in my ChangeLog in
the package);  next in a list in Hebrew community, and last in
ArabEyes.  Seems that the last one is the only one that has been
survived history.

This is the history about "farsi" in five paragraphs.  I also
hacked a Red Hat 7.2 to enable Persian on console.  I later took
some notes of what I did, and implemented it on another machine
from my notes.  The notes are in farsiredhat directory in archive
attached to this mail.  Note that they are pretty old.  Many
things have changed these days.

For the past few days I have been known as the most blocker of
the whole ArabEyes project ;-).  So I first answer the questions
I was asked about "farsi", and then go through the files in
attached archive.

Muhammad Alkarouri wrote:
>
> Thanks Behdad for your reply. I would like to know,
> though, what is the expected timeframe of including
> joining in fribidi.

2005.  No more, no less ;-).
Seriously, this winter.

> Another question for all:
> - do you know any problem that affects using farsi
> besides bidi before joining and shaping codes, and
> some may be next stage points like interaction with
> gpm and ncurses programs?

In the future, ncurses should implement its own bidi/shaping.
But before that, both ncurses and gpm need to get some stable
Unicode support.  I am supposed to have a look at Unicode support
in ncurses after I'm satisfied with GNOME (FriBidi, Pango, GTK+,
AbiWord), but most probably it's not before 2005.

> If there aren't I will base any future work on this
> code rather than the akka original.

:D.

Nadim Shakili wrote:
>
> A couple of questions though,
>
>  1. Can we take this conversation to Arabeyes' "developer"
>     mailing-list ?  I'm sure we'll want to refer back to
>     all these points in the future.

Sure.

>  2. Can we come up with an alternate name to this package.
>     Akka 2.0 (with no mention of the previous work or credits) ?
>     suggestions ?  Behdad, its your baby, so its your call.

Well, "farsi" is not such a bad name as long as it's used in
English written text ;-).  Ok, it has proved to be a bad name.
Perhaps '"farsi"' is a good name, but again in written context.
BTW, you should not need that word in English; one should always
use Persian to refer to the language.

Second, it's the Free World (as in Free Beer) of Free Software
(as in Freedom) ;-).  Feel Free to Fu^^Hack the code.  (Free as
in Freedom, not as in Beer.  Don't forget my Beer).

Akka 2.0, may make up a good name.  I too prefer not binding a
new name to the same functionality.  Perhaps we would want to
give some hints and credit to pre-2.0 Akka.  Roozbeh?

I'm fine with Akka, if on your website and the main README file,
you write it this way:
Akka (aka "farsi")

Another idea comes to my mind, about popping another name.  Just
take the middle and call it 'baghdad'?  ;-).


>  3. Can we, once 1&2 above are agreed upon, release this code
>     so that its archived somewhere.  From what I remember, the
>     code as it stands today is fully functional with the
>     exception of a few missing shaping characters.  Once those
>     are taken care of, we can release, right ?

I have attached my latest code, and hopefully with my comments at
the end of this message, you can make it fully functional.  Last
time I tried there were things that needed some change to work in
laters Red Hat systems.  Mainly, consolechars is dropped and
setfont should be used instead.

Muhammad Alkarouri
>
> While I would certainly prefer an alternative name,
> more descriptive to the type of the package, I see no
> reason why Behdad cannot name the package he has
> written in the way he wants. Two points are there:
> - Is Behdad willing to resume developing the package?
> If not, I suggest he publishes it and we can develop
> an Akka 2 based on it at Arabeyes. If he has the time,
> then we get a good package:)
> - We need some changes for it to be running. e.g. a
> unicode keymap for Arabic besides the isiri currently
> there. And I would remove the farsidict from the
> package since it is not much related. Otherwise, I
> would suggest getting an immediate release out and
> leave other changes to version 1.1. Actually we can
> release a 0.9 without even correcting these.

It would be nice if someone autotoolsize it.  Other changes I
have mentioned later.

> By the way, Behdad, what is the license of this
> package?

Roozbeh's and mine are in LGPL.  (Roozbeh?).  It means all the
library code.  The keymap is in public domain.  The font, is
based on Dmitry Bokhovityanov's VGA font that I have donated
Arabic glyphs.  There are some mapping tables and other stuff in
fonts dir done by me, that are in public domain.  You can check
the license using Google.  Remains the hard part:

The code I borrowed from script(1), which is the skeleton of
farsi/fcon/fcon.c, has a "BSD with advertisement clause" license.
You need to read it yourself and read through fsf.org to find out
what we can do.  I guess it should remain in that dirty kind of
BSD license forever.  No problem, can still link to the LGPLed
library.

One way is to cut my code out of that and place it in a GPLed
container, that the Akka project should already have.  It's just
a simple master/slave pty layer.  My code is the highly commented
part in farsi/fcon/fcon.c -- lines 200 to 350.  I mean this is
the part that is just my code, and the engine of the bidi
terminal itself.  The rest is very easy to find or reimplement.


Samy Al Bahra wrote:
>
> >  2. Can we come up with an alternate name to this
> > package. Akka 2.0 (with no mention of the previous work
> > or credits) ?
>
> No. You cannot just do that. People have already contributed
> a bit of code and effort to Akka, it isn't right for them
> "not be mentioned". I'm talking everything, including the
> original authors on which Akka was based on should be credited
> (even the old maintainer, me, Mohammad, Anas, etc...).
> [snip]

Well, I'm afraid you are wrong, both from an ethical point of
view, and from the law's.  First, Akka is not a trademark or any
other type of shit.  Second, previous authers already get some
extra credit by those lazy people that do not read the AUTHORS
file, nor release notes ;-).

I prefer them mentioned myself, if we are going to call it Akka
2...


BTW, there's a nice thing happening here.  Akka 1 was based on
the work previously called "acon", and Akka 2 may be based on my
code which the final part (teminal layer) is called "fcon".
That's a bit more interesting.  I don't know why those people
called their package "acon", should be "Arabic Console"?  But I
named it after "Farsi Condom".  As terminal people (around
linux-utf8 list at least) call these layers that sit down on a
dumb terminal and provide some functionality, condoms.  And this
is a Farsi condom.  But you can't shout it in Iran, so we came up
with "fcon" :-).

> If Akka is dropped and a NEW project is started WITH a
> different name then we can start over with the credits
> and what not.
> [snip]

Akka(TM) you mean? ;-)

> There are still a lot of bashisms that need to be
> dealt with.

I usually use bash where and only where C cannot be used.  In
this case, I agree that Pythong may be the answer.  I would get
to that later in this mail.

> > By the way, Behdad, what is the license of this
> > package?
>
> None based on the code I have, meaning, technically it
> is public domain. Behdad, so? I would imagine GPL (and
> would prefer BSD license).

Already discussed.

> [snip]
> I did hope you would realize this from this message and add
> a copyright statement to the code.

As I mentioned, that was the reason I never released it.


Mohammed Elzubeir wrote:
>
> Are you saying to simply apply bidi post-bidi in the farsi code? We can
> do that.

No, the problem is not any easy, but is some kind of local.  You
go your way to develop this code, I go mine on FriBidi, later the
merge is not any hard.

> Also, are you planning to maintain that (develop, etc..). I
> would like to use that as a basis to replace akka. Seeing that
> the console will always be a fixed-width environment, this
> switch in where the shaping is applied is less relevant (but we
> can always switch if it makes you happy).

We would later switch.  I have really done a hard job on fjoining
part to get reasonable results without having any standard on
where to apply joining.  (the hard problem if you don't see is
with RLO and LRO stuff).


Done :-).  Now my file-by-file review that can be the basis for
further development.  Keep me posted please.  I don't go through
farsiredhat stuff, that's pretty easy to understand.  Here is the
structure of the farsi/ code, but the architecture can totally
change, should the future developers feel the need.



ChangeLog:
Well, nice to have it around and add to this.  It certainly lacks
some of my work later on the code, but can be populated from this
mail for each file.  Perhaps this mail can be saved around there
named HISTORY, after mentioning Akka 1.x if needed.



Makefile:
Autotools perhaps.



README:
Should be replaced by a respectful one, but the contents should
definitely be used somewhere.



TODO:
On each entry I would comment as is needed:

* Parse command-line options.
> This would be definitely done.

* Somehow share options between C sources and shell
  script!
> We may never need it again, but I have some skeleton to
> share variables between C source and Shell script in a
> single file.  Ask me for it ;-).

* Documentation.
> Sure!

* Fix fconso bug, also support mc.
> Not sure if we really need that.  But the idea should
> be developed.  I would discuss it under fconso.c later.

* Clean-up ZWJ-ZWNJ-ZWJ code, also support
  ligature-making ZWJ.
> "ligature-making ZWJ" has been removed from Unicode
> standard, so nonsense.  About cleaning ZWJ-ZWNJ-ZWJ
> code, I can't remember what the problem was.  Not a
> serious problem perhaps.  Just clean up.

* Implement fcon as shared library (stick on fd 1 and 2
  if point to tty)!
> I would again discuss it later under fconso.c



fjoining/
To summerize, it does the joining, shaping, bidi (calling
fribidi), and the LAM-ALEF ligature, considering all options that
have been passed.


fjoining/Makefile
fjoining/fjoining-config.in:
Would be replaced by autotools, pkgconfig stuff.



fjoining/*.i
fjoining/fjoining_charprop.[ch]
fjoining/fjoining_compose.[ch]
fjoining/fjoining_log2cuni.[ch]
fjoining/fjoining_vis2cuni.[ch]
These are the main body of the library.  With tables in *.i
files.  It does some normalization and the joining and shaping.
Tables may need some update.  Roozbeh?
Note that the library accepts a bunch of options, defined in
fjoining/fjoining.h.  The exciting part is that it can do joining
without bidi sensibly.  You would later see that with a
left-to-right (mirrored) Arabic font, you can ready Arabic text
written (and shaped) from left to right, which is pretty useful
when your software does not support bidi (editors).



fjoining/fjoining_vu.c
It's a simple wrapper around library that filters text and
applies bidi and joining.  It accepts the options in numerical
right now.



fjoining/fjoining_ye.[ch]
fjoining/msye.c
fjoining/fixfarsiye.c
These deal with the problem of the Persian YEH in Microsoft
fonts.  The first one "msye" replaces initial and medial Persian
YEHs with Arabic YEH, and replace final and isolated Arabic YEHs
with Persian ones.  The other one, fixfarsiye.c simply replaces
Arabic YEH with Persian YEH.  Should not be needed anymore, but
would be handy around, as there are lots of Persian text with
mixed Arabic and Persian YEHs.  The names of course may change to
something more proper.



fconsole/
This is a level of abstraction that I really love.  This small
piece of does some ligaturing that is needed in console.  It can
be assumed as your rendering engine that handles harakats, ....
What it currently does, if I remember correctly, is to ligate
shadda+harakat combinations to a single ligature, and then
ligating harakats that are applied to a character that joins to
the next char, and put them on top of a tatweel (kashida).  It
gives a far better looking output.



fconsole/Makefile
fconsole/fconsole-config.in
Again, would be replaced by autotools stuff.



fconsole/fconsole_*.i
Ligature and shaping tables that the fonts supports.



fconsole/fconsole.h
fconsole/fconsole_ligature.[ch]
fconsole/fconsole_log2con.[ch]
The ligature engine again.  This shares some code with fjoining
siblings, but not so much to ruin the architecture for that.  No
need to change for the moment.



fconsole/fconsole_vu.c
Simple wrapper around library that uses fjoining stuff and do
console specific ligaturing.



fconsole/edconsole
fconsole/vuconsole
Test scripts that load a font and call fconsole_vu.  One of them
loads the font and sets options so that you see the bidi/joining
marks (edconsole), while the other one removes them (vuconsole).



fcon/
This is the terminal layer finally.



fcon.c
This is the code I borrowed from script(1).  As I mentioned
before, lines 200 to 350 is my work.  It simply sits between a
master/slave pty layer and applies fconsole on the stream.  It
takes care of a few interesting things.  For example:

* Escape seqeuences:  Escape sequences are considered as
  paragraph terminators right now.

* Paragraph terminators: "\n" usually.  Starts a new paragraph.

* Unfinished paragraphs:  This is the most trickey part that
  I'm sure has not been done in Akka :>.  If you have an
  unfinished paragraph, like you are typing Arabic on a bash
  prompt, it would remember your unfinished paragraph, and when
  you add characters to it, it "deletes" (writing backspace
  chars) whatever glyphs it has wrote on screen, and rewrites the
  whole paragraph.  So writing on a bash prompt you get perfect
  effect.  But of course it would fail if your unfinished
  paragraph spans the end of line.  Remember that this layer
  (fcon) would always remain a hach, and perfect bidi terminal
  cannot be implemented in this layer.  So, it's just trying to
  be a better hack.

* It accepts terminal UTF-8 on/off escape sequences, and would
  turn on/off the whole functionality.

I like this messy code :-).



fcon/fconso.c
It's some preprocessor hack that should be seen!
Back in my time, ncurses and slang didn't support Unicode by any
means.  So I wanted to turn my bidi turminal layer off, so wrote
this small library, that when preloaded using LD_PRELOAD, causes
any app that uses ncurses or slang to turn off the bidi
functionality, and moreover, to fall back to LANG=en_US.
But the code is not done yet.  I remember mc used to crash.  It
can be further developed.



Some note on fcon.  A terminal master/slave layer is the most
obvious way and the natural one to implement this thing, but has
some drawbacks.  The main one be that, you are sacrificing your
/dev/tty* terminal.  So for example you cannot startx from
withing such a bidi terminal.  There are a couple of ways to
overcome this problem I can imagine:

* Instead of a layer, the code can get loaded with LD_PRELOAD as
  a shared library, and override some system calls (open, write,
  dup, ...) and apply bidi on any file descriptor that is going
  to the terminal.  It's a bit shaky to determine that.  This way
  also has it's own known problems.

* A kernel module to apply all these code to console.  I once
  tried that but gave up.  It needs to port all fribidi and
  "farsi" code to kernel.  I may give it another try after
  reading Robert Love's book.



bin/farsifilter
Calles fcon/fcon.  Some bashism there to find the binary.
Nothing more.  Autotools would solve these bash problems.



bin/farsidict
A simple bash stuff to launch a lynx session to a dictionary
using bidi terminal.  Nice example perhaps.  And the dictionary
works for Persian.



bin/farsi
The main interface to the terminal program.  Parses options, load
fonts, keyboard maps, ..., run bidi console, then undo all that
did.



* In the future, a nice Python interface can be written that
  provides the whole functionality, so we can get rid of this
  piece.  But other pieces like vuconsole and edconsole ...
  should be thought of as test suites for their library, that can
  be distributed with binary packages, or not.



sbin/farsigetty
It's a Persian replacement for mgetty in /etc/inittab to give a
Persian console from the login time.  It assumes a lot from my
farsiredhat stuff.  Should be looked over to get the idea.
Also see my inittab in farsiredhat to see how I enabled a logical
(left to right) console.  It's a matter of some parameters to
bin/farsi wrapper.



keymap/isiri2901.kmap.gz
This is the standard keyboard map for Persian.  It's outdated and
should be upgraded.  I would provide a new one later.
Other ones should be added here.  Perhaps a symlink like
fa -> isiri2901.kmap.gz in the directory is in place.
Would be nice if stuff (font maps, keymaps, ...) from Hebrew
people would go around here.



font/farsi-8x16.bdf.gz
As mentioned above, it's a my edited version of Dmitry
Bolkhovityanov's font.  For the time being, this font can be
edited and used.  Later one should send patches upstream, and
perhaps to other 8x16 fonts that I have sent the same glyphs.
This is the original font that should be edited.



font/create_psf
Some bash script that creates a PSF font suitable for console,
from a bdf font, and some SFM maps.  There is an option I have
added that is --mirrorrtl, that causes all Arabic (right to left)
glyphs to be mirrored;  it is used to generate fonts for the
logical view I said before.



font/farsi_bdf2psf.pl
Perl script used by above bash script.  Hacked by me to implement
--mirrorrtl feature.



font/glyphlist.txt.gz
font/bdf_set_names
Adobe's glyph names list and a script I wrote to set proper names
in a BDF font.  Don't know if used here or not.  Well, xmbdfed
used to trash the names.  So I put stuff to reconstruct them.



font/farsi_ascii.sfm
font/farsi_arabic.sfm
font/farsi_marks.sfm
font/farsi_nomarks.sfm
Glyph maps that define which characters/glyphs should appear in a
PSF font.  The glyphs are then extracted from the BDF font.
ascii is the ascii block identity mapping.  farsi_arabic is the
base arabic block.  farsi_marks maps control chars, formatting
chars, different spacing and punctuation, ....  It is used for
when you do not want to remove marks in the pipeline.
farsi_nomarks instead, uses the same space as farsi_marks, but
feels with Latin characters.
All these maps try their best to map as many character as
possible.  For example, c-cedilla may be mapped on c.
There are marks as "# RTL ..."  in these files, that trigger the
perl script to mirror rtl chars if asked so.

Note: The package uses 512char fonts.  So you would lose one
color bit of your console.  This is the default since Red Hat 8
or 9.  BTW, if you load framebuffer console (sample is in my
farsiredhat package), you get your color bit back.



testtexts/hafez
First Persian sonnet from Hafez.



testtexts/fatiha
First surrah of Quran.



testtexts/marks
Some Unicode marks with their names.  To check if you are seeing
marks or they are removed.





Well, that's it.

Behdad Esfahbod
Dec 11 2003