[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [i18n] Still serious Arabic issues



On Saturday 10 January 2004 00:52, Pablo Saratxaga wrote:
> (grrr, I had made a long reply, then the computer shut off...)

It's nice to hear that even your computer shuts off suddenly and even YOU may 
lose something ;)

> > Generally speaking, it's also supported in ISO-8859-6, right?
>
> No.
> (well, iso-8859-6 is recognized by iconv; so you can convert from and
> back any text file; you can also send and receive iso-8859-6 mails,
> read iso-8859-6 web pages, use it in xchat, etc.
> But it is not supported for default locale encoding; simply because the
> programs that properly support arabic (bidi and shaping) do it in utf-8
> only; so there is no point in supporting other encodings for the locale
> (not to mention that in the future everything will have to switch to
> UTF-8 and support for old encodings dumped).

Well explained. Thanks.

> > I have some evidence now that may be the ar_SA encoding is ISO-8859-6,
> > not UTF-8
>
> Not for me:
>
> $ LC_ALL=ar_SA locale charmap  ; rpm -q locales-ar
> UTF-8
> locales-ar-2.3.2-5mdk

> Can you test with "locale charmap" ?
> And give me the output of:
> locale
> locale charmap
> rpm -q locales locales-ar

[munzir at mdklinuxserver munzir]$ locale charmap; rpm -q locales locales-ar
UTF-8
locales-2.3.2-5mdk
locales-ar-2.3.2-5mdk

[munzir at mdklinuxserver munzir]$ locale
LANG=ar_SA
LC_CTYPE=ar_SA
LC_NUMERIC=ar_SA
LC_TIME=ar_SA
LC_COLLATE=ar_SA
LC_MONETARY=ar_SA
LC_MESSAGES=ar_SA
LC_PAPER=ar_SA
LC_NAME=ar_SA
LC_ADDRESS=ar_SA
LC_TELEPHONE=ar_SA
LC_MEASUREMENT=ar_SA
LC_IDENTIFICATION=ar_SA
LC_ALL=

> Note that I switched some time ago to UTF-8 as default when building
> locales; those not using UTF-8 by default are only those specifically
> set that way.

So the default is UTF-8 which means something like ar_SA = ar_SA.UTF-8 by 
definition, right?

> > 2. I can't see Arabic file/folder names from any GTK+ app. Gedit for
> > example can't see Arabic Folder names if UTF-8 encoding is not choosen
> > explicitly upon the installation. It gives this error when launched from
> > konsole:
> >
> > Gtk-Message: ط¹ط¸ï؟½ ط¹ط¹ط¹ط¹ ط¸ï؟½ط¸آ­ط¹ط¹ط¹
>
> I'm unable to see that text; your mail says cp1256; but that doesn't
> look like anything.
>
> > "\345\315\344\317"
>
> that, however, is NOT an utf-8 sequence.
> So, you have a filename that is not in UTF-8; and gtk complains;

My point is if I set the option "use unicode by default" during the 
installation GTK doesn't complain!! WHY DOES IT MAKE A DIFFERENCE IF 
"ENABLING UNICODE" OPTION DOESN'T MAKE A DIFFERENCE FOR THE ARABIC LANGUAGE?

Also, how can I know the encoding of a filename not the contents of the file?

> so it seems that gtk expects them in UTF-8; so that you are in UTF-8
> mode after all...
> can you check with "locale charmap" ?

It turned out to be that the konsole terminal messes the encoding. Instead of 
copy/paste from console, I am going to redirect the output to a file and then 
paste it here.
[munzir at mdklinuxserver munzir]$ gedit 2>gedit_errors
[munzir at mdklinuxserver munzir]$ gedit gedit_errors

gives these lines in a mixture of Arabic and English. The Arabic text is 
saying "cannot transfer file name ... try to set the environment variable 
G_BROKEN_FILENAMES" or something similar. The number of the errors depends on 
how many Arabic filenames I have in the directory to which I browse from 
gedit file->open dialog box.

Gtk-Message: لا يمكن تحويل اسم الملف "\345\314\344\317" إلى UTF-8 (جرب تعيين 
المتغير البيئي G_BROKEN_FILENAMES): Invalid byte sequence in conversion input
Gtk-Message: لا يمكن تحويل اسم الملف "\347\317\352\311.doc" إلى UTF-8 (جرب 
تعيين المتغير البيئي G_BROKEN_FILENAMES): Invalid byte sequence in conversion 
input
Gtk-Message: لا يمكن تحويل اسم الملف "\322\310\317  
\307\344\343\344\345\307\312!.doc" إلى UTF-8 (جرب تعيين المتغير البيئي 
G_BROKEN_FILENAMES): Invalid byte sequence in conversion input
Gtk-Message: لا يمكن تحويل اسم الملف "\331\321\310\352~" إلى UTF-8 (جرب تعيين 
المتغير البيئي G_BROKEN_FILENAMES): Invalid byte sequence in conversion input
Gtk-Message: لا يمكن تحويل اسم الملف "\331\321\307\342.mpg" إلى UTF-8 (جرب 
تعيين المتغير البيئي G_BROKEN_FILENAMES): Invalid byte sequence in conversion 
input

The above mentioned problem happens only if the option "use unicode by 
default" is not enabled during the installation. AGAIN WHY DOES IT MAKE A 
DIFFERENCE?!!

> > 4. Displaying a UTF-8 Arabic file from a shell displays garbage whereas
> > displaying an ISO-8859-6 shows correctly
>
> a shell where? on console, xterm, konsole, gnome-terminal,...?

konsole (if I didn't enable use unicode by default during the installation) 
defaults to iso-8859-6

> in general the terminal has to be set to utf-8 mode;
> it is automatically done for gnome-terminal; others may need some
> configuration (tell us which one, so we can try to make it work
> automatically)

konsole, xterm, console, displays garbage. Only gnome-terminal displays it 
properly with no shaping of course.

> > BTW: For languages that have a utf and other encodings, how can one
> > revert back to the non-utf after it's enabled during the installation?
>
> You need to edit /etc/sysconfig/i18n (and/or ~/.i18n) and replace
> ".UTF-8" ending with the one for the encoding you want

I added a ".UTF-8" in my .i18n file and still gedit complains about the Arabic 
filenames with exactly the same error messages mentioned previously!!

Shouldn't there be an option in localedrake which says "use unicode" instead 
of doing it manually via configuration files?

> > Mandrake is a distro that always put a newbie/desktop user in mind, has
> > this been changed?
>
> No; and you are right, it should be done more automatically.
> It just hasn't been implemented yet

Will you please file a bug and tell us about it so this issue won't be 
forgotton.

> I'll send a mail to translators about it.
>
> > http://bugs.kde.org/show_bug.cgi?id=69458
>
> Ok, I see it.
>
> Note that it does work with gtk (gtk2 at least) programs;
> KDE/Qt people should ask gtk people how they did it.

No, it doesn't work with both KDE and GNOME when we are using KDE Keyboard 
Tool (kxkb?) and "Enble keyboard layout" is choosen from the keyboard layout 
tab; but it works in both if we used keyboarddrake. If keyboarddrake has a 
way to enable an icon in the kicker panel to switch the lang then may be we 
don't need to use kxkb.

> (I just tested launching both gedit and kedit; switched to
> arabic keyboard layout, shortcuts worked in gedit; but
> not in kedit)

I can't understand how comes it happens you like this. I am trying it with 
Mandrake 9.2. All the systems I checked have the same problem.

> > > The yes/no locale problem should be fixed, isn't it?
> >
> > not yet
> > $ locale -c yesexpr noexpr
> >

> > LC_MESSAGES
> > ^(ن|نyYعم)
> > LC_MESSAGES
> > ^(ل|لnNا)
> >
> > May be there is a problem in the syntax, square brackets instead of
> > parenthesis? I expected it to be a regex syntax like this instead:
>
> I don't have that...
>
> on a 9.2:
>
> [root at test root]# LC_ALL=ar locale -c yesexpr noexpr ; rpm -q locales-ar
> glibc
> LC_MESSAGES
> ^[نyY].*
> LC_MESSAGES
> ^[لnN].*
> locales-ar-2.3.2-5mdk
> glibc-2.3.2-14mdk

me too. I have the same output as you if I set LC_ALL=ar explicitly. But 
without it the output is as I said before is not correct. And still the 
problem persists:
[munzir at mdklinuxserver munzir]$ rm .Xauthority
rm: remove regular file `.Xauthority'? y
[munzir at mdklinuxserver munzir]$ ls .Xauthority
.Xauthority

Then why do
locale -c yesexpr noexpr
is different than
LC_ALL=ar locale -c  yesexpr noexpr
?
Why is LC_ALL is not set by default in Mandrake 9.2?

> The problem was different.
> Arabic (and Devanagari, Tamil, etc) need shaping; that is, they need a
> proper font with the glyphs needed for display, a bimap font is not
> enough.
>
> Hopefully it will be fixed for next release (and an easier way to deal
> with fonts during install will be devised)

I also hope they will fix it soon before Debian does it ;) (I am forwarding 
another message to encourage you)

what has happened to this bug which is in my doc:
If one chooses the Arabic keyboard during the installation, the BACKSPACE key 
doesn't work during the installation. For example, when I type the password, 
I can't use BACKSPACE to delete it.

Pablo: During install old xmodmap system is used; I will look at the 
xmodmap.ar file what is the problem. I don't see absolutely anything wrong! I 
attach you the xmodmap file, you can load it with xmodmap command (eg: 
xmodmap filename). I load it and the backspace key works fine for me... after 
install, the Arabic layout is from

/usr/X11R6/lib/X11/xkb/symbols/pc/ar

Inspite of Mr Pablo's comment, it still doesn't work!!

-- 
Munzir Taha
PGP Key available:
gpg --recv-keys --keyserver www.mandrakesecure.net F0671821

Telecommunications and Electronics Engineer
Linux Registered User #279362 at http://counter.li.org
Mandrake Club member
Maintainer of Mandrake Arabization Project Status (MAPS)
http://www.arabeyes.org/download/documents/distros/mdkarabicsupport-en/
CIW Designer, ICDL, MOUS
New Horizons CLC
Riyadh, SA