[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Submitted papers

To: general at arabeyes dot org
Subject: Re: Submitted papers
From: David Starner <dstarner98 at aasaa dot ofe dot org>
Date: Sun, 19 Aug 2001 15:05:03 -0500
User-agent: Mutt/1.3.20i

On Sun, Aug 19, 2001 at 08:55:03PM +0200, Chahine M. Hamila wrote:
> note that utf-8 as internal encoding for an application is not the most practical,
> especially in terms of algorithmic complexity (other encodings such as UCS are
> better for that).

What do you mean by UCS, UCS-2 or UCS-4? UCS-4 is more commonly 
known as UTF-32. That's what charsets(7) calls it, for example, 
and UTF-32 is what I've usually seen. It's sometimes easier than 
UTF-8, but it's sometimes easier to just use the locale charset
and standard multibyte technices.

UCS-2 is a bad idea, since it can't handle surrogate characters.
They may be minor, but decent Unicode support includes handling
them. UTF-16 is arguably no easier than UTF-8, since you have to
handle characters made up of more than one 16 block and you have
to change all the ASCII comparisons (c == ' ') to UTF-16.

-- 
David Starner - dstarner98 at aasaa dot ofe dot org
Pointless website: http://dvdeug.dhis.org
"I don't care if Bill personally has my name and reads my email and 
laughs at me. In fact, I'd be rather honored." - Joseph_Greg

Follow-Ups:
- Re: Submitted papers
  - From: Chahine M. Hamila

References:
- Re: Submitted papers
  - From: Behdad Esfahbod
- Re: Submitted papers
  - From: Chahine M. Hamila

Prev by Date: Re: Submitted papers
Next by Date: Re: Submitted papers
Previous by thread: Re: Submitted papers
Next by thread: Re: Submitted papers
Index(es):
- Date
- Thread