[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Requirements Wiki
- To: General Arabization Discussion <general at arabeyes dot org>
- Subject: Requirements Wiki
- From: gar <gar at arabink dot com>
- Date: Tue, 30 Nov 2004 23:32:09 +0300
Hi,
As I mentioned in my previous note, the KDiff3 developer will need help
defining requirements for diffing Arabic text files. There are other
areas where interested developers could no doubt use some help in
defining Arabization requirements:
cursor behavior in editors
sorting
searching
managing diacritics
typesetting (e.g. don't break a line immediately after a copulative
waw, how to best justify a line, etc.)
keyboard support
etc.
What I propose is that Arabeyes set up a wiki devoted solely to
discussing and defining requirements. Possibly to include test cases
where appropriate - in the case of diffing, a bunch of small files to
exercise various diff functionalities. Or maybe that should go into a
different section. The goal being that this would be a standard
resource for anybody developing Arabic-enabled software, regardless of
platform.
To start the ball rolling, here is what I propose as a subset of
requirements for comparing arabic text:
- the obvious thing, comparing strings of utf-8 Arabic should show the
differences
- option to ignore tatweel (or kashida or whatever it's called these
days on Unicode)
- option to ignore ZWJ and ZWNJ, directionality markers, etc
- guidelines for how to highlight differences involving ZWJ/ZWNJ,
directional markers, etc. in a GUI
- option to ignore all diacritics
- option to ignore vowel diacritics (question: should fathatan etc. be
considered a vowel? or a combo of vowel and consonant?)
- option to ignore tanween
- option to ignore all but radicals (I can dream, can't I?)
- glyhph-sensitivity options - e.g. should kZWNJtZWNJb match ktb? A
more practical example: lam used as a prefix to a quoted word (see next
item)
- option to ignore other meta chars like quotes - then should
li-"kitaab" match lktaab?
Note that I'm coming at this from a textual point of view, not a
programming point of view - I might want to ignore lots of characters
that might screw up the syntax of program code. Obviously quotation
marks, for example, will be pretty important for a programmer, but for
a linguist or literary scholar analyzing text, it might be useful to
have a diff tool ignore them. A similar set of requirements can be
stated for searching and sorting, but I'll wait and see if my wiki
proposal flies before posting them.
What say ye, list?
-gregg
G. Reynolds