[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Correct sorting?

--- Abdulaziz Al-Arfaj <aalarfaj at gmail dot com> wrote:
> On Sun, 28 Nov 2004 07:14:56 -0800 (PST), Ossama Khayat <okhayat at yahoo dot com>
> [...]
> > Well, I don't understand those 0x stuff alot ;-) but can you please
> > edit the corresponding file or point me to them?
> Neither do I ;-)

The 0x stuff is the encoding of the characters mentioned in case some
don't know what they look like.  Those encodings are listed in the
Unicode "code charts" tables [1] (search for 'Arabic' and 'Arabic
Presentation Forms-B').

> Wait,  now I remember. I translated that file, and actually each of
> those words does not end in yeh then teh-marbuta. Its actually yeh
> (0x64A) then SHADDA (a diactric/composing character of zero-width)
> then teh-marbuta (0x0629). I think I am 80% certain that this SHADDA
> sitting in the middle between the two characters is the reason why
> they aren't being joined together.

Certainly sounds like it.

> I believe the file is console-data_debian_po.po (level 2), but I was
> not saying the file needs to be edited. Having a shadda there is
> proper. The shadda is a zero-width character that should not affect
> the shaping/joining of the two characters it sits between, but maybe
> (still just guessing here) thats not whats going on in D-I. Perhaps as
> a test, we can make a copy of the file without the shaddas, and
> Christian could test it out, and see if the problem goes away? I'm
> sorry I cannot do it myself. No Linux box within reach for the next
> week or so :-(

There are two solutions here,

 1. We either strip all the diacritics/harakat before we ship these files
    over to debian (via a script) and this is easily done.  I think we
    should maintain the "correct" contents of the files in CVS and not
    strip 'em and commit 'em.

 2. We have slang internally ignore all the diacritics/harakat since there
    are other languages that will also benefit from this added hack.  In
    reality though, this needs to be fixed properly and I believe Steve
    (the slang champion) is aware of this, yet I'm unsure if he's planning
    on doing anything on this front.  It might be worth-while to ping him
    on this...

So do we have time to resolve this (via point-1 above) or are we late ?

[1] http://www.unicode.org/charts


 - Nadim

Do you Yahoo!? 
Take Yahoo! Mail with you! Get it on your mobile phone. 