[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Submitted papers
- To: general at arabeyes dot org
- Subject: Re: Submitted papers
- From: David Starner <dstarner98 at aasaa dot ofe dot org>
- Date: Sun, 19 Aug 2001 15:05:03 -0500
- User-agent: Mutt/1.3.20i
On Sun, Aug 19, 2001 at 08:55:03PM +0200, Chahine M. Hamila wrote:
> note that utf-8 as internal encoding for an application is not the most practical,
> especially in terms of algorithmic complexity (other encodings such as UCS are
> better for that).
What do you mean by UCS, UCS-2 or UCS-4? UCS-4 is more commonly
known as UTF-32. That's what charsets(7) calls it, for example,
and UTF-32 is what I've usually seen. It's sometimes easier than
UTF-8, but it's sometimes easier to just use the locale charset
and standard multibyte technices.
UCS-2 is a bad idea, since it can't handle surrogate characters.
They may be minor, but decent Unicode support includes handling
them. UTF-16 is arguably no easier than UTF-8, since you have to
handle characters made up of more than one 16 block and you have
to change all the ASCII comparisons (c == ' ') to UTF-16.
--
David Starner - dstarner98 at aasaa dot ofe dot org
Pointless website: http://dvdeug.dhis.org
"I don't care if Bill personally has my name and reads my email and
laughs at me. In fact, I'd be rather honored." - Joseph_Greg