[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Arabic plural forms issues



السلام عليكم
   The current plural form gettext equation used in Gnome, KDE and probably in 
other projects is wrong [5]. I know that QAC investigated this issues but it 
didn't decide on a definitive solution, so it is clear that the current 
plural form expression is a temporary hack and of course confirmed as wrong. 
In http://wiki.arabeyes.org/QacDecisions:

>  This GNU Plural Header will be used when we can find a way to script its
> functionality: nplurals = 7;
>  plurals = n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 :
> n%100==1 ? 5 : n%100==2 ? 6 : 4;

   Arabic follows sophisticated rules to decide on the form of the "counted" 
items. Moreover, there are two distinct rules, one if numbers are read from 
the right to the left and an other if they are read from the left to the 
right, as the form of counted items follows the last number.
   Both rules are correct, studies confirm that in the past both rules were 
allowed, however reading from the right to the left - following the order of 
the letters in Arabic - is the more respected rule, and nowadays medias use 
reading from the left to the right.
   Since there are two rules, we need decide on one and only one to use in the 
translations. Reading from the left to the right is not implementable at all 
as a gettext formula first, and second to substract 1 or 2 from the variable. 
See [1]. Reading from the right to the left is implementable [2] and with 
less cases (6) compared with (8) in the other. I am not sure if this is only 
correct if we assume that "101 كتاب" is read "واحد ومئة كتاب" and not "مئة 
كتاب وكتاب", see [2].
  The current plural form is not clear, not documented and linguistically 
wrong. 0 and a mysterious case (4th) were merged, the rule for numbers from 
11-99 is not there.
  Conclusion: The right-to-left form is the one to choose [2].

   Plural forms 0, 1 and 2 don't require a variable, and here comes another 
issue. If two variables are included in the string, say %s and %d, and 
that %d is omitted this leads the application to crash (Segmentation fault) 
[3]. A solution exists, it is to use variable shuffling, which displays a 
correct result and doesn't crash [4]. This need be documented and tested for 
other implementations other than C. (Thank you Djihed for the idea)

   One other issue is what form to use for non-integer numbers.

   We may sometimes need to translate applications that don't support plural 
form for their simplicity, and use for example "user(s)" in English. I 
suggest to contact the developer and ask him/her to support plural forms, if 
for some reasons this can't be done, we can use "من" as in "وصل ثلاثة من 
الرجال" . (Thank you Munzir for the idea).

   What can we do?
- Comment on this
- Mark all strings that contain plural forms as fuzzy, and replace the formula 
in all files with the help of a script. Correct these fuzzy strings to add 
missing cases.



[1] http://perso.menara.ma/yollnet/pluralforms1.png
[2] nplurals=6; plural=n == 0 ? 0 : n == 1 ? 1 : n == 2 ? 2 : n >= 3 && n <= 
10 ? 3 : n >= 11 && n <= 99 ? 4 : 5;
[3] echo 'int main() { printf("%s \n", 2, "text");}' > test.c ; gcc -w 
test.c -o output ; ./output
[4] echo 'int main() { printf("%2$s \n", 2, "text");}' > test.c ; gcc -w 
test.c -o output ; ./output 
[5] Plural-Forms: nplurals=4; plural=n==1 ? 0 : n==2 ? 1 : n>=3 && n<=10 ? 2 : 
3;