[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Submitted papers



David Starner wrote:

> On Sun, Aug 19, 2001 at 08:55:03PM +0200, Chahine M. Hamila wrote:
> > note that utf-8 as internal encoding for an application is not the most practical,
> > especially in terms of algorithmic complexity (other encodings such as UCS are
> > better for that).
>
> What do you mean by UCS, UCS-2 or UCS-4?

There I am not an expert and I wasn't aware of any problem with UCS-2. But what I meant
when writing that above is either UCS-2 or UCS-4 invariably. Both are better in terms
of internal processing in a program since each character takes a constant space in
memory. UTF-8 is good for storage or data exchange, but it multiplies complexity of
many basic string functions by n.

> UCS-4 is more commonly
> known as UTF-32. That's what charsets(7) calls it, for example,
> and UTF-32 is what I've usually seen. It's sometimes easier than
> UTF-8, but it's sometimes easier to just use the locale charset
> and standard multibyte technices.

Agree.

> UCS-2 is a bad idea, since it can't handle surrogate characters.
> They may be minor, but decent Unicode support includes handling
> them. UTF-16 is arguably no easier than UTF-8, since you have to
> handle characters made up of more than one 16 block and you have
> to change all the ASCII comparisons (c == ' ') to UTF-16.