[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Glibc Arabic locales



Current Arabic locale definitions are so buggy with many incorrect or
even missed details, as an Egyptian, I'm considering ar_EG locale but
many of these details are true for other Arabic locales.

#Month and Day names: current definition includes truncated month and
day names as abbreviated form; thing like ينا، فبر، مار، etc. Those
forms are completely non-sense in Arabic and not used any where, simply
abbreviated form should be the same as complete form (like what fa_IR
locale is doing)

#Digits:Egypt, and most Arabic countries except Al-Maghrib Al-Arabi,
uses Arabic-Indic (or Mashriqi Arabic) digits, the locale definition
should reflect this, which it doesn't currently.

#Date formates: currently the full date form prints like, خميس يونيو 14,
it should be something like الخميس 14 يونيو, the same with other date
formates, date formates also needs fixing.

#Collation rules (alphabetic sorting): this needs more investigation, 
how we should treat WAW HAMZAH ABOVE and YEH HAMZA ABOVE; as a variant 
of yaa' and waw or as forms of alef? alef forms with hamza and maddah 
should precede or follow the normal alef? I think the better way to 
solve this is to find how Arabic dictionaries deal with this issue.

This what I remember now, if any one knows any other issue, please tell
us about it.

As a work around this, I started to fix ar_EG and I think I reached an
acceptable "working" level, one thing I can't figure out yet is to get
full year in alt digits (٢٠٠٧ instead of 2007) I can get the short form
(٠٧) but not the long one.

Another problem is lack of documentation, there is very little
documentation about that subject that I can find, even "man locale"
isn't complete, google didn't help so much.

We started a wiki page (http://wiki.arabeyes.org/الإعدادات_المحلية) to
collect information about that issue, if any one knows the details
about other Arabic speaking countries please add it their.

Waiting for your ideas and comments.

The modified ar_EG locale definition file is attached, compile is with
"lacaledef" tool.

-- 
Khaled Hosny

Egyptian GNU/Linux user
Member of Arabeyes team [www.arabeyes.org]
My Blog: [www.khaledhosny.org]

Support Free Knowledge [ar.wikipedia.org]
comment_char    %
escape_char     /
% Arabic language locale for Egypt.
% Contributed by Kentaroh Noji <knoji at jp dot ibm dot com> and
% Tetsuji Orita <orita at jp dot ibm dot com>, modified by
% Khaled Hosny <khaledhosny at eglug dot org>

LC_IDENTIFICATION
title      "Arabic language locale for Egypt"
source     "Arabeyes Project"
address    ""
contact    ""
email      "bug-glibc-locales at gnu dot org"
tel        ""
fax        ""
language   "Arabic"
territory  "Egypt"
revision   "2.0"
date       "2007-05-11"
%
category  "ar_EG:2007";LC_IDENTIFICATION
category  "ar_EG:2007";LC_CTYPE
category  "ar_EG:2007";LC_COLLATE
category  "ar_EG:2007";LC_TIME
category  "ar_EG:2007";LC_NUMERIC
category  "ar_EG:2007";LC_MONETARY
category  "ar_EG:2007";LC_MESSAGES
category  "ar_EG:2007";LC_PAPER
category  "ar_EG:2007";LC_NAME
category  "ar_EG:2007";LC_ADDRESS
category  "ar_EG:2007";LC_TELEPHONE

END LC_IDENTIFICATION

LC_CTYPE
copy "i18n"

% Arabic uses the alternate (ARABIC-INDIC) digits U0660..U0669
outdigit <U0660>..<U0669>

% This is used in the scanf family of functions to read Arabic numbers
% using "%Id" and such.
map to_inpunct; /
  (<U0030>,<U0660>); /
  (<U0031>,<U0661>); /
  (<U0032>,<U0662>); /
  (<U0033>,<U0663>); /
  (<U0034>,<U0664>); /
  (<U0035>,<U0665>); /
  (<U0036>,<U0666>); /
  (<U0037>,<U0667>); /
  (<U0038>,<U0668>); /
  (<U0039>,<U0669>); /
  (<U002E>,<U06D4>); /
  (<U002C>,<U060C>)

% This is used in the printf family of functions to write Arabic floating
% point numbers using "%If" and such.
map to_outpunct; /
  (<U002E>,<U066B>); /
  (<U002C>,<U066C>)

translit_start
include "translit_combining";""
translit_end
END LC_CTYPE

LC_COLLATE

% Copy the template from ISO/IEC 14651
copy "iso14651_t1"

END LC_COLLATE

LC_MONETARY
% This is the POSIX Locale definition the LC_MONETARY category.
% These are generated based on XML base Locale difintion file
% for IBM Class for Unicode/Java
%
int_curr_symbol       "<U0045><U0047><U0050><U0020>" % "EGP "
currency_symbol       "<U062C><U002E><U0645><U002E>" % "ج.م."
mon_decimal_point     "<U066B>" %  
mon_thousands_sep     "<U066B>" % 
mon_grouping          3
positive_sign         ""
negative_sign         "<U002D>" % "-"
int_frac_digits       3
frac_digits           3
p_cs_precedes         0
p_sep_by_space        1
n_cs_precedes         0
n_sep_by_space        1
p_sign_posn           1
n_sign_posn           2
%
END LC_MONETARY


LC_NUMERIC
% This is the POSIX Locale definition for the LC_NUMERIC  category.
%
decimal_point          "<U002E>" % 
thousands_sep          "<U066B>" % 
grouping               3
%
END LC_NUMERIC


LC_TIME
% Alternative digits are used for Arabic numerals in date and time. This is
% a hack, until a new prefix is defined for alternative digits.
alt_digits	"<U0660><U0660>";"<U0660><U0661>";/
		"<U0660><U0662>";"<U0660><U0663>";/
		"<U0660><U0664>";"<U0660><U0665>";/
		"<U0660><U0666>";"<U0660><U0667>";/
		"<U0660><U0668>";"<U0660><U0669>";/
		"<U0661><U0660>";"<U0661><U0661>";/
		"<U0661><U0662>";"<U0661><U0663>";/
		"<U0661><U0664>";"<U0661><U0665>";/
		"<U0661><U0666>";"<U0661><U0667>";/
		"<U0661><U0668>";"<U0661><U0669>";/
		"<U0662><U0660>";"<U0662><U0661>";/
		"<U0662><U0662>";"<U0662><U0663>";/
		"<U0662><U0664>";"<U0662><U0665>";/
		"<U0662><U0666>";"<U0662><U0667>";/
		"<U0662><U0668>";"<U0662><U0669>";/
		"<U0663><U0660>";"<U0663><U0661>";/
		"<U0663><U0662>";"<U0663><U0663>";/
		"<U0663><U0664>";"<U0663><U0665>";/
		"<U0663><U0666>";"<U0663><U0667>";/
		"<U0663><U0668>";"<U0663><U0669>";/
		"<U0664><U0660>";"<U0664><U0661>";/
		"<U0664><U0662>";"<U0664><U0663>";/
		"<U0664><U0664>";"<U0664><U0665>";/
		"<U0664><U0666>";"<U0664><U0667>";/
		"<U0664><U0668>";"<U0664><U0669>";/
		"<U0665><U0660>";"<U0665><U0661>";/
		"<U0665><U0662>";"<U0665><U0663>";/
		"<U0665><U0664>";"<U0665><U0665>";/
		"<U0665><U0666>";"<U0665><U0667>";/
		"<U0665><U0668>";"<U0665><U0669>";/
		"<U0666><U0660>";"<U0666><U0661>";/
		"<U0666><U0662>";"<U0666><U0663>";/
		"<U0666><U0664>";"<U0666><U0665>";/
		"<U0666><U0666>";"<U0666><U0667>";/
		"<U0666><U0668>";"<U0666><U0669>";/
		"<U0667><U0660>";"<U0667><U0661>";/
		"<U0667><U0662>";"<U0667><U0663>";/
		"<U0667><U0664>";"<U0667><U0665>";/
		"<U0667><U0666>";"<U0667><U0667>";/
		"<U0667><U0668>";"<U0667><U0669>";/
		"<U0668><U0660>";"<U0668><U0661>";/
		"<U0668><U0662>";"<U0668><U0663>";/
		"<U0668><U0664>";"<U0668><U0665>";/
		"<U0668><U0666>";"<U0668><U0667>";/
		"<U0668><U0668>";"<U0668><U0669>";/
		"<U0669><U0660>";"<U0669><U0661>";/
		"<U0669><U0662>";"<U0669><U0663>";/
		"<U0669><U0664>";"<U0669><U0665>";/
		"<U0669><U0666>";"<U0669><U0667>";/
		"<U0669><U0668>";"<U0669><U0669>"
% Arabic doesn't have abbreviations for weekdays and month names, so
% "abday" is the same as "day" and "abmon" is the same as "mon"

% Abbreviated weekday names (%a)
abday       "<U0623><U062D><U062F>";/
            "<U0627><U062B><U0646><U064A><U0646>";/
            "<U062B><U0644><U0627><U062B><U0627><U0621>";/
            "<U0623><U0631><U0628><U0639><U0627><U0621>";/
            "<U062E><U0645><U064A><U0633>";/
            "<U062C><U0645><U0639><U0629>";/
            "<U0633><U0628><U062A>"
%
% Full weekday names (%A)
day         "<U0627><U0644><U0623><U062D><U062F>";/
            "<U0627><U0644><U0627><U062B><U0646><U064A><U0646>";/
            "<U0627><U0644><U062B><U0644><U0627><U062B><U0627><U0621>";/
            "<U0627><U0644><U0623><U0631><U0628><U0639><U0627><U0621>";/
            "<U0627><U0644><U062E><U0645><U064A><U0633>";/
            "<U0627><U0644><U062C><U0645><U0639><U0629>";/
            "<U0627><U0644><U0633><U0628><U062A>"
%
% Abbreviated month names (%b)
abmon       "<U064A><U0646><U0627><U064A><U0631>";/
            "<U0641><U0628><U0631><U0627><U064A><U0631>";/
            "<U0645><U0627><U0631><U0633>";/
            "<U0623><U0628><U0631><U064A><U0644>";/
            "<U0645><U0627><U064A><U0648>";/
            "<U064A><U0648><U0646><U064A><U0648>";/
            "<U064A><U0648><U0644><U064A><U0648>";/
            "<U0623><U063A><U0633><U0637><U0633>";/
            "<U0633><U0628><U062A><U0645><U0628><U0631>";/
            "<U0623><U0643><U062A><U0648><U0628><U0631>";/
            "<U0646><U0648><U0641><U0645><U0628><U0631>";/
            "<U062F><U064A><U0633><U0645><U0628><U0631>"
%
% Full month names (%B)
mon         "<U064A><U0646><U0627><U064A><U0631>";/
            "<U0641><U0628><U0631><U0627><U064A><U0631>";/
            "<U0645><U0627><U0631><U0633>";/
            "<U0623><U0628><U0631><U064A><U0644>";/
            "<U0645><U0627><U064A><U0648>";/
            "<U064A><U0648><U0646><U064A><U0648>";/
            "<U064A><U0648><U0644><U064A><U0648>";/
            "<U0623><U063A><U0633><U0637><U0633>";/
            "<U0633><U0628><U062A><U0645><U0628><U0631>";/
            "<U0623><U0643><U062A><U0648><U0628><U0631>";/
            "<U0646><U0648><U0641><U0645><U0628><U0631>";/
            "<U062F><U064A><U0633><U0645><U0628><U0631>"
%
% Equivalent of AM PM
am_pm       "<U0635>";"<U0645>"
%

% Appropriate date and time representation (%c)
%       "<RLE>%A %Oe %B %Oy<ARABIC COMMA> %OI:%OM:%OS<PDF>"
d_t_fmt "<U202B><U0025><U0041><U0020>/
<U0025><U004F><U0065><U0020>/
<U0025><U0042><U0020>/
<U0025><U004F><U0079><U060C><U0020>/
<U0025><U004F><U0049><U003A>/
<U0025><U004F><U004D><U003A>/
<U0025><U004F><U0053><U202C>"

%
% Appropriate date representation (%x)
% "%Oy/%Om/%Od"
d_fmt   "<U0025><U004F><U0079><U002F>/
<U0025><U004F><U006D><U002F>/
<U0025><U004F><U0064>"

%
% Appropriate time representation (%X)
% "%OI:%OM:%OS"
t_fmt   "<U0025><U004F><U0049><U003A>/
<U0025><U004F><U004D><U003A>/
<U0025><U004F><U0053>"

%
% Appropriate 12 h time representation (%r)
% "%OI:%OM:%OS %p"
t_fmt_ampm  "<U0025><U004F><U0049><U003A><U0025><U004F><U004D>/
<U003A><U0025><U004F><U0053><U0020><U0025><U0070>"
%
% Appropriate date representation (date(1))   "%a %b %e %H:%M:%S %Z %Y"
%       "<RLE>%A %Oe %B %Y<ARABIC COMMA> %OI:%OM:%OS (%Z)<PDF>"
date_fmt "<U202B><U0025><U0041><U0020>/
<U0025><U004F><U0065><U0020>/
<U0025><U0042><U0020>/
<U0025><U0059><U060C><U0020>/
<U0025><U004F><U0049><U003A>/
<U0025><U004F><U004D><U003A>/
<U0025><U004F><U0053><U0020>/
<U0028><U0025><U005A><U0029><U202C>"
first_weekday 7
first_workday 7
END LC_TIME


LC_MESSAGES
yesexpr     "<U005E><U005B><U0646><U0079><U0059><U005D><U002E><U002A>"
noexpr      "<U005E><U005B><U0644><U006E><U004E><U005D><U002E><U002A>"

yesstr      "<U0646><U0639><U0645>"
nostr       "<U0644><U0627>"
END LC_MESSAGES


LC_PAPER
% This is the ISO_IEC TR14652 Locale definition for the
% LC_PAPER category
height      297
width       210

END LC_PAPER


LC_NAME
% This is the ISO_IEC TR14652 Locale definition for the
% LC_NAME category.
%
name_fmt    "<U0025><U0070><U0025><U0074><U0025><U0066><U0025><U0074>/
<U0025><U0067>"
name_gen    "<U002D><U0073><U0061><U006E>"
name_mr     "<U0633><U064A><U062F>"
name_mrs    "<U0633><U064A><U062F><U0629>"
name_miss   "<U0622><U0646><U0633><U0629>"
name_ms     "<U0622><U0646><U0633><U0629>"

END LC_NAME


LC_ADDRESS
% This is the ISO_IEC TR14652 Locale definition for the
% LC_ADDRESS
postal_fmt   "<U0025><U007A><U0025><U0063><U0025><U0054><U0025><U0073>/
<U0025><U0062><U0025><U0065><U0025><U0072>"
country_name "<U0645><U0635><U0631>"
country_ab2  "<U0045><U0047>"
country_ab3  "<U0045><U0047><U0059>"
country_num 818
country_isbn "<U0037><U0037><U0037>"
lang_name    "<U0639><U0631><U0628><U064A>"

END LC_ADDRESS


LC_TELEPHONE
% This is the ISO_IEC TR14652 Locale definition for the
%
tel_int_fmt "<U002B><U0025><U0063><U0020><U003B><U0025><U0061><U0020>/
<U003B><U0025><U006C>"
int_prefix  "<U0032><U0030>"
int_select  "<U0030><U0030>"
END LC_TELEPHONE


LC_MEASUREMENT
% This is the ISO_IEC TR14652 Locale definition for the
%
measurement 1

END LC_MEASUREMENT

Attachment: pgpY7aXdHD6OC.pgp
Description: PGP signature