[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: farsi. farsi! farsi? farsi:
- To: Behdad Esfahbod <behdad at cs dot toronto dot edu>
- Subject: Re: farsi. farsi! farsi? farsi:
- From: Roozbeh Pournader <roozbeh at sharif dot edu>
- Date: Fri, 12 Dec 2003 22:42:17 +0330
- Cc: FarsiWeb Discussion List <farsiweb at sharif dot edu>, ArabEyes Developer Discussion List <developer at arabeyes dot org>
On Fri, 2003-12-12 at 01:25, Behdad Esfahbod wrote:
> Disclaimer 1: This is not a Persian vs Farsi war message.
> Disclaimer 2: CC to FarsiWeb list is just informational.
> Disclaimer 3: The attached code is not in Public Domain.
> Disclaimer 4: This is a long boring message. Your own risk.
> Two long years ago, is such a day that today is, perhaps in the
> same wee hours in the morning but in Tehran time, I have been
> polishing and wrapping up some piece of code that is has been
> called "farsi" since then.
> The story still goes more back. Should have been in late 2000
> that Roozbeh Pournader wrote some C code to convert Unicode
> Persian text to some legacy character set called iransystem. As
> a requirement for that, he wrote the joining code that was later
> used by me in "farsi".
> Late 2001, my major work on FriBidi has been done, so was the
> time to use what I have been doing. Took Roozbeh's code, cleaned
> up, plugged FriBidi, and it was what you get as farsi/fjoining/.
> I wrote some more code to fill the gap in console to handle
> harakats, and called it farsi/fconsole, and finally grabbed
> source code from script(1), hacked a few lines, and called it
> farsi/fcon. With the helpf of font tools I borrowed from another
> project and keyboard driver I wrote down, I had finally done my
> pet called "farsi" that was doing me more than Akka was able to
> do (for me as a Persian).
> Since then the code got some clean up and some features added,
> but nothing else changed, even the user base itself that was
> limited to me, myself, and behdad. The package named "farsi" was
> still waiting for me (and Roozbeh) to resolve the copyright
> status and get released, while I lost my interest in bidi console
> and it wend down into my 10GB archive of last (lost) files.
> Fortunately I did three small releases of the code, first on a
> local list called 'farsidev' that does not exist today anymore
> (and I cannot remember even. Just wrote it in my ChangeLog in
> the package); next in a list in Hebrew community, and last in
> ArabEyes. Seems that the last one is the only one that has been
> survived history.
> This is the history about "farsi" in five paragraphs. I also
> hacked a Red Hat 7.2 to enable Persian on console. I later took
> some notes of what I did, and implemented it on another machine
> from my notes. The notes are in farsiredhat directory in archive
> attached to this mail. Note that they are pretty old. Many
> things have changed these days.
> For the past few days I have been known as the most blocker of
> the whole ArabEyes project ;-). So I first answer the questions
> I was asked about "farsi", and then go through the files in
> attached archive.
> Muhammad Alkarouri wrote:
> > Thanks Behdad for your reply. I would like to know,
> > though, what is the expected timeframe of including
> > joining in fribidi.
> 2005. No more, no less ;-).
> Seriously, this winter.
> > Another question for all:
> > - do you know any problem that affects using farsi
> > besides bidi before joining and shaping codes, and
> > some may be next stage points like interaction with
> > gpm and ncurses programs?
> In the future, ncurses should implement its own bidi/shaping.
> But before that, both ncurses and gpm need to get some stable
> Unicode support. I am supposed to have a look at Unicode support
> in ncurses after I'm satisfied with GNOME (FriBidi, Pango, GTK+,
> AbiWord), but most probably it's not before 2005.
> > If there aren't I will base any future work on this
> > code rather than the akka original.
> Nadim Shakili wrote:
> > A couple of questions though,
> > 1. Can we take this conversation to Arabeyes' "developer"
> > mailing-list ? I'm sure we'll want to refer back to
> > all these points in the future.
> > 2. Can we come up with an alternate name to this package.
> > Akka 2.0 (with no mention of the previous work or credits) ?
> > suggestions ? Behdad, its your baby, so its your call.
> Well, "farsi" is not such a bad name as long as it's used in
> English written text ;-). Ok, it has proved to be a bad name.
> Perhaps '"farsi"' is a good name, but again in written context.
> BTW, you should not need that word in English; one should always
> use Persian to refer to the language.
> Second, it's the Free World (as in Free Beer) of Free Software
> (as in Freedom) ;-). Feel Free to Fu^^Hack the code. (Free as
> in Freedom, not as in Beer. Don't forget my Beer).
> Akka 2.0, may make up a good name. I too prefer not binding a
> new name to the same functionality. Perhaps we would want to
> give some hints and credit to pre-2.0 Akka. Roozbeh?
> I'm fine with Akka, if on your website and the main README file,
> you write it this way:
> Akka (aka "farsi")
> Another idea comes to my mind, about popping another name. Just
> take the middle and call it 'baghdad'? ;-).
> > 3. Can we, once 1&2 above are agreed upon, release this code
> > so that its archived somewhere. From what I remember, the
> > code as it stands today is fully functional with the
> > exception of a few missing shaping characters. Once those
> > are taken care of, we can release, right ?
> I have attached my latest code, and hopefully with my comments at
> the end of this message, you can make it fully functional. Last
> time I tried there were things that needed some change to work in
> laters Red Hat systems. Mainly, consolechars is dropped and
> setfont should be used instead.
> Muhammad Alkarouri
> > While I would certainly prefer an alternative name,
> > more descriptive to the type of the package, I see no
> > reason why Behdad cannot name the package he has
> > written in the way he wants. Two points are there:
> > - Is Behdad willing to resume developing the package?
> > If not, I suggest he publishes it and we can develop
> > an Akka 2 based on it at Arabeyes. If he has the time,
> > then we get a good package:)
> > - We need some changes for it to be running. e.g. a
> > unicode keymap for Arabic besides the isiri currently
> > there. And I would remove the farsidict from the
> > package since it is not much related. Otherwise, I
> > would suggest getting an immediate release out and
> > leave other changes to version 1.1. Actually we can
> > release a 0.9 without even correcting these.
> It would be nice if someone autotoolsize it. Other changes I
> have mentioned later.
> > By the way, Behdad, what is the license of this
> > package?
> Roozbeh's and mine are in LGPL. (Roozbeh?). It means all the
> library code. The keymap is in public domain. The font, is
> based on Dmitry Bokhovityanov's VGA font that I have donated
> Arabic glyphs. There are some mapping tables and other stuff in
> fonts dir done by me, that are in public domain. You can check
> the license using Google. Remains the hard part:
> The code I borrowed from script(1), which is the skeleton of
> farsi/fcon/fcon.c, has a "BSD with advertisement clause" license.
> You need to read it yourself and read through fsf.org to find out
> what we can do. I guess it should remain in that dirty kind of
> BSD license forever. No problem, can still link to the LGPLed
> One way is to cut my code out of that and place it in a GPLed
> container, that the Akka project should already have. It's just
> a simple master/slave pty layer. My code is the highly commented
> part in farsi/fcon/fcon.c -- lines 200 to 350. I mean this is
> the part that is just my code, and the engine of the bidi
> terminal itself. The rest is very easy to find or reimplement.
> Samy Al Bahra wrote:
> > > 2. Can we come up with an alternate name to this
> > > package. Akka 2.0 (with no mention of the previous work
> > > or credits) ?
> > No. You cannot just do that. People have already contributed
> > a bit of code and effort to Akka, it isn't right for them
> > "not be mentioned". I'm talking everything, including the
> > original authors on which Akka was based on should be credited
> > (even the old maintainer, me, Mohammad, Anas, etc...).
> > [snip]
> Well, I'm afraid you are wrong, both from an ethical point of
> view, and from the law's. First, Akka is not a trademark or any
> other type of shit. Second, previous authers already get some
> extra credit by those lazy people that do not read the AUTHORS
> file, nor release notes ;-).
> I prefer them mentioned myself, if we are going to call it Akka
> BTW, there's a nice thing happening here. Akka 1 was based on
> the work previously called "acon", and Akka 2 may be based on my
> code which the final part (teminal layer) is called "fcon".
> That's a bit more interesting. I don't know why those people
> called their package "acon", should be "Arabic Console"? But I
> named it after "Farsi Condom". As terminal people (around
> linux-utf8 list at least) call these layers that sit down on a
> dumb terminal and provide some functionality, condoms. And this
> is a Farsi condom. But you can't shout it in Iran, so we came up
> with "fcon" :-).
> > If Akka is dropped and a NEW project is started WITH a
> > different name then we can start over with the credits
> > and what not.
> > [snip]
> Akka(TM) you mean? ;-)
> > There are still a lot of bashisms that need to be
> > dealt with.
> I usually use bash where and only where C cannot be used. In
> this case, I agree that Pythong may be the answer. I would get
> to that later in this mail.
> > > By the way, Behdad, what is the license of this
> > > package?
> > None based on the code I have, meaning, technically it
> > is public domain. Behdad, so? I would imagine GPL (and
> > would prefer BSD license).
> Already discussed.
> > [snip]
> > I did hope you would realize this from this message and add
> > a copyright statement to the code.
> As I mentioned, that was the reason I never released it.
> Mohammed Elzubeir wrote:
> > Are you saying to simply apply bidi post-bidi in the farsi code? We can
> > do that.
> No, the problem is not any easy, but is some kind of local. You
> go your way to develop this code, I go mine on FriBidi, later the
> merge is not any hard.
> > Also, are you planning to maintain that (develop, etc..). I
> > would like to use that as a basis to replace akka. Seeing that
> > the console will always be a fixed-width environment, this
> > switch in where the shaping is applied is less relevant (but we
> > can always switch if it makes you happy).
> We would later switch. I have really done a hard job on fjoining
> part to get reasonable results without having any standard on
> where to apply joining. (the hard problem if you don't see is
> with RLO and LRO stuff).
> Done :-). Now my file-by-file review that can be the basis for
> further development. Keep me posted please. I don't go through
> farsiredhat stuff, that's pretty easy to understand. Here is the
> structure of the farsi/ code, but the architecture can totally
> change, should the future developers feel the need.
> Well, nice to have it around and add to this. It certainly lacks
> some of my work later on the code, but can be populated from this
> mail for each file. Perhaps this mail can be saved around there
> named HISTORY, after mentioning Akka 1.x if needed.
> Autotools perhaps.
> Should be replaced by a respectful one, but the contents should
> definitely be used somewhere.
> On each entry I would comment as is needed:
> * Parse command-line options.
> > This would be definitely done.
> * Somehow share options between C sources and shell
> > We may never need it again, but I have some skeleton to
> > share variables between C source and Shell script in a
> > single file. Ask me for it ;-).
> * Documentation.
> > Sure!
> * Fix fconso bug, also support mc.
> > Not sure if we really need that. But the idea should
> > be developed. I would discuss it under fconso.c later.
> * Clean-up ZWJ-ZWNJ-ZWJ code, also support
> ligature-making ZWJ.
> > "ligature-making ZWJ" has been removed from Unicode
> > standard, so nonsense. About cleaning ZWJ-ZWNJ-ZWJ
> > code, I can't remember what the problem was. Not a
> > serious problem perhaps. Just clean up.
> * Implement fcon as shared library (stick on fd 1 and 2
> if point to tty)!
> > I would again discuss it later under fconso.c
> To summerize, it does the joining, shaping, bidi (calling
> fribidi), and the LAM-ALEF ligature, considering all options that
> have been passed.
> Would be replaced by autotools, pkgconfig stuff.
> These are the main body of the library. With tables in *.i
> files. It does some normalization and the joining and shaping.
> Tables may need some update. Roozbeh?
> Note that the library accepts a bunch of options, defined in
> fjoining/fjoining.h. The exciting part is that it can do joining
> without bidi sensibly. You would later see that with a
> left-to-right (mirrored) Arabic font, you can ready Arabic text
> written (and shaped) from left to right, which is pretty useful
> when your software does not support bidi (editors).
> It's a simple wrapper around library that filters text and
> applies bidi and joining. It accepts the options in numerical
> right now.
> These deal with the problem of the Persian YEH in Microsoft
> fonts. The first one "msye" replaces initial and medial Persian
> YEHs with Arabic YEH, and replace final and isolated Arabic YEHs
> with Persian ones. The other one, fixfarsiye.c simply replaces
> Arabic YEH with Persian YEH. Should not be needed anymore, but
> would be handy around, as there are lots of Persian text with
> mixed Arabic and Persian YEHs. The names of course may change to
> something more proper.
> This is a level of abstraction that I really love. This small
> piece of does some ligaturing that is needed in console. It can
> be assumed as your rendering engine that handles harakats, ....
> What it currently does, if I remember correctly, is to ligate
> shadda+harakat combinations to a single ligature, and then
> ligating harakats that are applied to a character that joins to
> the next char, and put them on top of a tatweel (kashida). It
> gives a far better looking output.
> Again, would be replaced by autotools stuff.
> Ligature and shaping tables that the fonts supports.
> The ligature engine again. This shares some code with fjoining
> siblings, but not so much to ruin the architecture for that. No
> need to change for the moment.
> Simple wrapper around library that uses fjoining stuff and do
> console specific ligaturing.
> Test scripts that load a font and call fconsole_vu. One of them
> loads the font and sets options so that you see the bidi/joining
> marks (edconsole), while the other one removes them (vuconsole).
> This is the terminal layer finally.
> This is the code I borrowed from script(1). As I mentioned
> before, lines 200 to 350 is my work. It simply sits between a
> master/slave pty layer and applies fconsole on the stream. It
> takes care of a few interesting things. For example:
> * Escape seqeuences: Escape sequences are considered as
> paragraph terminators right now.
> * Paragraph terminators: "\n" usually. Starts a new paragraph.
> * Unfinished paragraphs: This is the most trickey part that
> I'm sure has not been done in Akka :>. If you have an
> unfinished paragraph, like you are typing Arabic on a bash
> prompt, it would remember your unfinished paragraph, and when
> you add characters to it, it "deletes" (writing backspace
> chars) whatever glyphs it has wrote on screen, and rewrites the
> whole paragraph. So writing on a bash prompt you get perfect
> effect. But of course it would fail if your unfinished
> paragraph spans the end of line. Remember that this layer
> (fcon) would always remain a hach, and perfect bidi terminal
> cannot be implemented in this layer. So, it's just trying to
> be a better hack.
> * It accepts terminal UTF-8 on/off escape sequences, and would
> turn on/off the whole functionality.
> I like this messy code :-).
> It's some preprocessor hack that should be seen!
> Back in my time, ncurses and slang didn't support Unicode by any
> means. So I wanted to turn my bidi turminal layer off, so wrote
> this small library, that when preloaded using LD_PRELOAD, causes
> any app that uses ncurses or slang to turn off the bidi
> functionality, and moreover, to fall back to LANG=en_US.
> But the code is not done yet. I remember mc used to crash. It
> can be further developed.
> Some note on fcon. A terminal master/slave layer is the most
> obvious way and the natural one to implement this thing, but has
> some drawbacks. The main one be that, you are sacrificing your
> /dev/tty* terminal. So for example you cannot startx from
> withing such a bidi terminal. There are a couple of ways to
> overcome this problem I can imagine:
> * Instead of a layer, the code can get loaded with LD_PRELOAD as
> a shared library, and override some system calls (open, write,
> dup, ...) and apply bidi on any file descriptor that is going
> to the terminal. It's a bit shaky to determine that. This way
> also has it's own known problems.
> * A kernel module to apply all these code to console. I once
> tried that but gave up. It needs to port all fribidi and
> "farsi" code to kernel. I may give it another try after
> reading Robert Love's book.
> Calles fcon/fcon. Some bashism there to find the binary.
> Nothing more. Autotools would solve these bash problems.
> A simple bash stuff to launch a lynx session to a dictionary
> using bidi terminal. Nice example perhaps. And the dictionary
> works for Persian.
> The main interface to the terminal program. Parses options, load
> fonts, keyboard maps, ..., run bidi console, then undo all that
> * In the future, a nice Python interface can be written that
> provides the whole functionality, so we can get rid of this
> piece. But other pieces like vuconsole and edconsole ...
> should be thought of as test suites for their library, that can
> be distributed with binary packages, or not.
> It's a Persian replacement for mgetty in /etc/inittab to give a
> Persian console from the login time. It assumes a lot from my
> farsiredhat stuff. Should be looked over to get the idea.
> Also see my inittab in farsiredhat to see how I enabled a logical
> (left to right) console. It's a matter of some parameters to
> bin/farsi wrapper.
> This is the standard keyboard map for Persian. It's outdated and
> should be upgraded. I would provide a new one later.
> Other ones should be added here. Perhaps a symlink like
> fa -> isiri2901.kmap.gz in the directory is in place.
> Would be nice if stuff (font maps, keymaps, ...) from Hebrew
> people would go around here.
> As mentioned above, it's a my edited version of Dmitry
> Bolkhovityanov's font. For the time being, this font can be
> edited and used. Later one should send patches upstream, and
> perhaps to other 8x16 fonts that I have sent the same glyphs.
> This is the original font that should be edited.
> Some bash script that creates a PSF font suitable for console,
> from a bdf font, and some SFM maps. There is an option I have
> added that is --mirrorrtl, that causes all Arabic (right to left)
> glyphs to be mirrored; it is used to generate fonts for the
> logical view I said before.
> Perl script used by above bash script. Hacked by me to implement
> --mirrorrtl feature.
> Adobe's glyph names list and a script I wrote to set proper names
> in a BDF font. Don't know if used here or not. Well, xmbdfed
> used to trash the names. So I put stuff to reconstruct them.
> Glyph maps that define which characters/glyphs should appear in a
> PSF font. The glyphs are then extracted from the BDF font.
> ascii is the ascii block identity mapping. farsi_arabic is the
> base arabic block. farsi_marks maps control chars, formatting
> chars, different spacing and punctuation, .... It is used for
> when you do not want to remove marks in the pipeline.
> farsi_nomarks instead, uses the same space as farsi_marks, but
> feels with Latin characters.
> All these maps try their best to map as many character as
> possible. For example, c-cedilla may be mapped on c.
> There are marks as "# RTL ..." in these files, that trigger the
> perl script to mirror rtl chars if asked so.
> Note: The package uses 512char fonts. So you would lose one
> color bit of your console. This is the default since Red Hat 8
> or 9. BTW, if you load framebuffer console (sample is in my
> farsiredhat package), you get your color bit back.
> First Persian sonnet from Hafez.
> First surrah of Quran.
> Some Unicode marks with their names. To check if you are seeing
> marks or they are removed.
> Well, that's it.
> Behdad Esfahbod
> Dec 11 2003
> FarsiWeb mailing list
> FarsiWeb at lists dot sharif dot edu