[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Arabic issues (was Re: [Munzir Taha] [Cooker] [Bug] LANG=xx whatever doesn't work)



On Monday 15 December 2003 17:50, Pablo Saratxaga wrote:
> Kaixo!
Hi (Though I don't know what Kaixo! mean ;)


> Arabic is supported in UTF-8 only; 

Generally speaking, it's also supported in ISO-8859-6, right?

> so by choosing Arabic you also choose 
> UTF-8 encoding.

I have some evidence now that may be the ar_SA encoding is ISO-8859-6, not 
UTF-8

1. kedit saves any Arabic doc with ISO-8859-6 and I can't even choose another 
encoding. As if it defaults to the system locale.

2. I can't see Arabic file/folder names from any GTK+ app. Gedit for example 
can't see Arabic Folder names if UTF-8 encoding is not choosen explicitly 
upon the installation. It gives this error when launched from konsole:

Gtk-Message: عظ� عععع ظ�ظ­ععع ظ�ظ�ع ظ�عععع "\345\315\344\317" ظ�عع UTF-8 
(ظ،ظ�ظ� ظ�ظ�ععع ظ�ععظ�ظ�عظ� ظ�عظ�عظ�ع G_BROKEN_FILENAMES): Invalid byte 
sequence in conversion input
Gtk-Message: عظ� عععع ظ�ظ­ععع ظ�ظ�ع ظ�عععع "\347\317\352\311.doc" ظ�عع UTF-8 
(ظ،ظ�ظ� ظ�ظ�ععع ظ�ععظ�ظ�عظ� ظ�عظ�عظ�ع G_BROKEN_FILENAMES): Invalid byte 
sequence in conversion input
Gtk-Message: عظ� عععع ظ�ظ­ععع ظ�ظ�ع ظ�عععع "\322\310\317  
\307\344\343\344\345\307\312!.doc" ظ�عع UTF-8 (ظ،ظ�ظ� ظ�ظ�ععع ظ�ععظ�ظ�عظ� 
ظ�عظ�عظ�ع G_BROKEN_FILENAMES): Invalid byte sequence in conversion input
Gtk-Message: عظ� عععع ظ�ظ­ععع ظ�ظ�ع ظ�عععع "\331\321\310\352~" ظ�عع UTF-8 
(ظ،ظ�ظ� ظ�ظ�ععع ظ�ععظ�ظ�عظ� ظ�عظ�عظ�ع G_BROKEN_FILENAMES): Invalid byte 
sequence in conversion input

3. Send Arabic text via kmail and setting the encoding to Auto-detect, send in 
Arabic iso-8859-6 encoding.

4. Displaying a UTF-8 Arabic file from a shell displays garbage whereas 
displaying an ISO-8859-6 shows correctly

5. more other observations regarding urpmi output in console.

All these and more is about to convince me that ar_SA != ar_SA.UTF-8.

BTW: For languages that have a utf and other encodings, how can one revert 
back to the non-utf after it's enabled during the installation?


> 3. if Arabic (or any other language btw) is installed after the install
> of the system; you would also want to edit:
> /etc/rpm/macros to add it ("ar") to the %_install_langs macro; otherwise
> the translation files coming in the rpm packages won't get installed when
> you install/update an rpm package.

Mandrake is a distro that always put a newbie/desktop user in mind, has this 
been changed?

> 4.1.1 you can in fact either use KDE keyboard switcher; or instead use
> the X11 keyboard (configured trough keyboarddrake).
> it's a matter of taste.
>
> 5.2 I haven't looked at akka again yet.
>
> 7. comma: that is a translation issue. tell me which packages are concerned
> and I'll try to correct them.

You can find this in many packages such as: locales-ar. Also when launching 
rpmdrake on the message "Please wait, finding available packages..." (the 
comma after wait). Also, the comma in the option "All packages, 
alphabetical".

> keyboard: you either use KDE or keyboarddrake (X11 in fact) to manage
> the keyboard switching; you cannot use both.
>

> shortcuts and KDE: that issue should be common to all non-latin languages;
> maybe it has been discussed on KDE mailing lists; if not, it should.
> On Gtk the keyboard sends Ctrl(or Alt)-(arabic keysym); then, it determines
> from the keyboard map which latin letter corresponds to the arabic keysym,
> and converts the result as Ctrl(or Alt)-(that latin letter).

http://bugs.kde.org/show_bug.cgi?id=69458


> eg, arabic layout has:
>
>   key <AD01> {  [      Arabic_dad,     Arabic_fatha     ]     };
>
> so if you press Ctrl-Arabic_dad, and if you have a keyboard
> configuration of "fr,ar", it will look at the value of <AD01> key for
> the latin layout ("fr" in this case):
>
>     key <AD01>  { [         a,          A,           ae,           AE ] };
>
> So, pressing Ctrl-Arabic_dad with a kbd layout of "fr,ar" will produce
> a Ctrl-a.
> However, if you had "us,ar", it will be:
>
>     key <AD01> {        [         q,    Q               ]       };
>
> so, Ctrl-Arabic_dad would be Ctrl-q.
>

$xev 
KeyPress event, serial 27, synthetic NO, window 0x2000001,
    root 0x48, subw 0x0, time 12484896, (335,219), root:(340,245),
    state 0x2010, keycode 24 (keysym 0x5d6, Arabic_dad), same_screen YES,
    XLookupString gives 2 bytes:  "ظ�"

$ xmodmap -pke |grep dad
keycode  24 = q Q Arabic_dad Arabic_fatha

whereas
$showkey from a VC gives
keycode 16 press

why are they different?



> The yes/no locale problem should be fixed, isn't it?

not yet
$ locale -c yesexpr noexpr

LC_MESSAGES
^(ن|نyYعم)
LC_MESSAGES
^(ل|لnNا)

May be there is a problem in the syntax, square brackets instead of 
parenthesis? I expected it to be a regex syntax like this instead:

LC_MESSAGES
^[yYن].*
LC_MESSAGES
^[nNل].*


Regarding bug:
http://qa.mandrakesoft.com/show_bug.cgi?id=5181
you said before:
"There are severe size limitations, and the problem is that the size of the 
Arabic fonts were a bit too big. To fix it the installation stage (not DrakX, 
but the way the running Linux is put on memory) has to be completely 
modified. It's way too late for this version. For version after 9.2 the 
install method will be rethought to overcome that problem (that also affect 
several other languages)."

Now, I can see the problem is solved for the Hebrew language
http://qa.mandrakesoft.com/show_bug.cgi?id=4659

What's the Arabic status in cooker.

Please Mr. Pablo, I know you are busy but we need to help each other to 
improve Mandrake's localization status.

-- 
Munzir Taha
PGP Key available:
gpg --recv-keys --keyserver www.mandrakesecure.net F0671821

Telecommunications and Electronics Engineer
Linux Registered User #279362 at http://counter.li.org
Mandrake Club member
Maintainer of Mandrake Arabization Project Status (MAPS)
http://www.arabeyes.org/download/documents/distro/mdkarabicsupport.html
CIW Designer, ICDL, MOUS
New Horizons CLC
Riyadh, SA