Re: KDiff3 Arabization

--- gar at arabink dot com wrote:
> The developer of KDiff3 has expressed strong interest in adding Arabic
> support.
> Please see my post yesterday to 'general' regarding requirements for
> Arabic diff'ing.

With regard to your post [1] in which you note a number of points I'll
give my thought on the 'diff' part here (citing some of that post below)
and address the more general (nonspecific to 'diff') issues as a reply
on the 'general' list.

> To start the ball rolling, here is what I propose as a subset of 
> requirements for comparing arabic text:
> 	- the obvious thing, comparing strings of utf-8 Arabic should
>         show the differences
> 	- option to ignore tatweel (or kashida or whatever it's called
>         these days on Unicode)
> 	- option to ignore ZWJ and ZWNJ, directionality markers, etc
> 	- guidelines for how to highlight differences involving ZWJ/ZWNJ,
>         directional markers, etc. in a GUI
> 	- option to ignore all diacritics
> 	- option to ignore vowel diacritics (question: should fathatan
>         etc. be considered a vowel? or a combo of vowel and consonant?)
> 	- option to ignore tanween
> 	- option to ignore all but radicals (I can dream, can't I?)
> 	- glyhph-sensitivity options - e.g. should kZWNJtZWNJb match ktb?
>         A more practical example: lam used as a prefix to a quoted word
>         (see next item)
> 	- option to ignore other meta chars like quotes - then should 
>         li-"kitaab" match lktaab?

I just tried vanilla diff (v-2.8.1) on mlterm (and PuTTY's development
version) and was able to see correct results and behavior akin to what
you'd expect from a latin/english text file.  As I'm not familiar and
nor have I used Kdiff3 (but I gather its rather similar to Tkdiff which
I have used), I'm unsure of its added functionality.  I'm guessing that
all added-GUI diff applications simply live on-top of vanilla diff as
such the diff results are generated properly for UTF-8 Arabic.  Then
this really turns to a question of how to display Arabic properly using
the Qt or GTK or whatever library and not really a diff-specific issue,
do you agree ?

With regard to the various options above - I'm all for it (within reason),
but I'd guess that these options would really need to live in 'diff'
proper and not the GUIs in order to capture all these other applications
that live on top, no ?  In other words, change one thing and not 10.  To
be precise with the options, I'd only ask for ignoring all of the 
harakat/diacritics (if possible) and tatweel only.

While on the subject I would recommend that we propose to 'diff's
maintainer that he add a '--ignore_regex' option so that people, using
any language, can dramatically extend 'diff's functionality by ignoring
anything they like.  So as an idea you could then simply say,

  $ diff --ignore_regex "fatha | damma | tatweel" ar_file1.txt ar_file2.txt

where 'fatha', 'damma' and 'tatweel' are the actual visual characters
(they'd look too small and confusing if I pasted 'em here :-).  With
this though I think we'd have a much higher chance of getting 'diff's
author/maintainer to include the functionality we're after without
making it Arabic specific (ie. lots of other people will benefit as
well from such an option).

> If the maintainer of the wiki is reading this: can you add an entry
> to the "Starting Points" section of the main page as follows:

There is no maintainer per se - everyone is welcome to add info/content
as they see fit.  In other words, help yourself :-)

BTW: do please correct your mail client to include your fullname
     instead of it simply noting 'gar' (yes, it would make me
     much happier :-)

[1] http://lists.arabeyes.org/archives/general/2004/November/msg00065.html

Hope that helps.


 - Nadim

