[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A bug in what (was Re: A bug in cut?)?



On Yaum al-Arbi'a 06 Rabi` al-Thaani 1425 09:30 pm, Behdad Esfahbod wrote:

> > yes all of them use unicode but let me ask you all something, can any one
> > display the attached file in any editor besides vim? It should contain
> > U00D9 (LATIN CAPITAL LETTER U WITH GRAVE)
>
> NO, THEY ARE NOT.  Your file is in ISO 8859-1 (aka Latin1)
> encoding.  Vim autodetects that and handles it, gnome-terminal
> and konsole simply don't understand it.
>
> Try "file konsole_bug" and it will tell you:

Thank Behdad, Now I think I understand the whole story.
$ cat arabic
٠
١
٢
٣
٤
٥

$ hexdump arabic
0000000 a0d9 d90a 0aa1 a2d9 d90a 0aa3 a4d9 d90a
0000010 0aa5
0000012

$ file arabic
arabic: UTF-8 Unicode text

$ cut -b1 arabic| tee cutted_arabic|hexdump
0000000 0ad9 0ad9 0ad9 0ad9 0ad9 0ad9
000000c

$ file cutted_arabic
cutted_arabic: ISO-8859 text

In summary, when some bytes are cut from a utf-8 file, the bytes remained are 
not a valid utf-8 bytes sequence and hence the "file" utility response.


> Vim autodetects that and handles it, gnome-terminal
> and konsole simply don't understand it.

I tried to open an ISO-8859-6 file using vim to test this autodetection 
feature of vim but it was garbage. Why?

-- 
Munzir Taha  PGP Key available
gpg --recv-keys --keyserver www.mandrakesecure.net F0671821

Telecommunications and Electronics Engineer
Linux Registered User #279362 at http://counter.li.org
Mandrake Club member
Maintainer of Mandrake Arabization Project Status (MAPS)
http://www.arabeyes.org/download/documents/distros/mdkarabicsupport-en/
CIW Designer, ICDL, MOUS
New Horizons CLC
Riyadh, SA