[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: A bug in what (was Re: A bug in cut?)?
- To: Development Discussions <developer at arabeyes dot org>
- Subject: Re: A bug in what (was Re: A bug in cut?)?
- From: Munzir Taha <munzirtaha at newhorizons dot com dot sa>
- Date: Thu, 27 May 2004 04:53:06 +0300
- Organization: New Horizons
- User-agent: KMail/1.6.1
On Yaum al-Arbi'a 06 Rabi` al-Thaani 1425 09:30 pm, Behdad Esfahbod wrote:
> > yes all of them use unicode but let me ask you all something, can any one
> > display the attached file in any editor besides vim? It should contain
> > U00D9 (LATIN CAPITAL LETTER U WITH GRAVE)
>
> NO, THEY ARE NOT. Your file is in ISO 8859-1 (aka Latin1)
> encoding. Vim autodetects that and handles it, gnome-terminal
> and konsole simply don't understand it.
>
> Try "file konsole_bug" and it will tell you:
Thank Behdad, Now I think I understand the whole story.
$ cat arabic
٠
١
٢
٣
٤
٥
$ hexdump arabic
0000000 a0d9 d90a 0aa1 a2d9 d90a 0aa3 a4d9 d90a
0000010 0aa5
0000012
$ file arabic
arabic: UTF-8 Unicode text
$ cut -b1 arabic| tee cutted_arabic|hexdump
0000000 0ad9 0ad9 0ad9 0ad9 0ad9 0ad9
000000c
$ file cutted_arabic
cutted_arabic: ISO-8859 text
In summary, when some bytes are cut from a utf-8 file, the bytes remained are
not a valid utf-8 bytes sequence and hence the "file" utility response.
> Vim autodetects that and handles it, gnome-terminal
> and konsole simply don't understand it.
I tried to open an ISO-8859-6 file using vim to test this autodetection
feature of vim but it was garbage. Why?
--
Munzir Taha PGP Key available
gpg --recv-keys --keyserver www.mandrakesecure.net F0671821
Telecommunications and Electronics Engineer
Linux Registered User #279362 at http://counter.li.org
Mandrake Club member
Maintainer of Mandrake Arabization Project Status (MAPS)
http://www.arabeyes.org/download/documents/distros/mdkarabicsupport-en/
CIW Designer, ICDL, MOUS
New Horizons CLC
Riyadh, SA