[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Requirements Wiki



Hi,

As I mentioned in my previous note, the KDiff3 developer will need help defining requirements for diffing Arabic text files. There are other areas where interested developers could no doubt use some help in defining Arabization requirements:

cursor behavior in editors
sorting
searching
managing diacritics
typesetting (e.g. don't break a line immediately after a copulative waw, how to best justify a line, etc.)
keyboard support
etc.


What I propose is that Arabeyes set up a wiki devoted solely to discussing and defining requirements. Possibly to include test cases where appropriate - in the case of diffing, a bunch of small files to exercise various diff functionalities. Or maybe that should go into a different section. The goal being that this would be a standard resource for anybody developing Arabic-enabled software, regardless of platform.

To start the ball rolling, here is what I propose as a subset of requirements for comparing arabic text:

- the obvious thing, comparing strings of utf-8 Arabic should show the differences
- option to ignore tatweel (or kashida or whatever it's called these days on Unicode)
- option to ignore ZWJ and ZWNJ, directionality markers, etc
- guidelines for how to highlight differences involving ZWJ/ZWNJ, directional markers, etc. in a GUI
- option to ignore all diacritics
- option to ignore vowel diacritics (question: should fathatan etc. be considered a vowel? or a combo of vowel and consonant?)
- option to ignore tanween
- option to ignore all but radicals (I can dream, can't I?)
- glyhph-sensitivity options - e.g. should kZWNJtZWNJb match ktb? A more practical example: lam used as a prefix to a quoted word (see next item)
- option to ignore other meta chars like quotes - then should li-"kitaab" match lktaab?


Note that I'm coming at this from a textual point of view, not a programming point of view - I might want to ignore lots of characters that might screw up the syntax of program code. Obviously quotation marks, for example, will be pretty important for a programmer, but for a linguist or literary scholar analyzing text, it might be useful to have a diff tool ignore them. A similar set of requirements can be stated for searching and sorting, but I'll wait and see if my wiki proposal flies before posting them.

What say ye, list?

-gregg


G. Reynolds