[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Submitted papers

To: general at arabeyes dot org
Subject: Re: Submitted papers
From: "Chahine M. Hamila" <mch at chaham dot com>
Date: Mon, 20 Aug 2001 00:03:03 +0200

David Starner wrote:

> On Sun, Aug 19, 2001 at 08:55:03PM +0200, Chahine M. Hamila wrote:
> > note that utf-8 as internal encoding for an application is not the most practical,
> > especially in terms of algorithmic complexity (other encodings such as UCS are
> > better for that).
>
> What do you mean by UCS, UCS-2 or UCS-4?

There I am not an expert and I wasn't aware of any problem with UCS-2. But what I meant
when writing that above is either UCS-2 or UCS-4 invariably. Both are better in terms
of internal processing in a program since each character takes a constant space in
memory. UTF-8 is good for storage or data exchange, but it multiplies complexity of
many basic string functions by n.

> UCS-4 is more commonly
> known as UTF-32. That's what charsets(7) calls it, for example,
> and UTF-32 is what I've usually seen. It's sometimes easier than
> UTF-8, but it's sometimes easier to just use the locale charset
> and standard multibyte technices.

Agree.

> UCS-2 is a bad idea, since it can't handle surrogate characters.
> They may be minor, but decent Unicode support includes handling
> them. UTF-16 is arguably no easier than UTF-8, since you have to
> handle characters made up of more than one 16 block and you have
> to change all the ASCII comparisons (c == ' ') to UTF-16.

Follow-Ups:
- Re: Submitted papers
  - From: Behdad Esfahbod

References:
- Re: Submitted papers
  - From: Behdad Esfahbod
- Re: Submitted papers
  - From: Chahine M. Hamila
- Re: Submitted papers
  - From: David Starner

Prev by Date: Re: Submitted papers
Next by Date: Re: Submitted papers
Previous by thread: Re: Submitted papers
Next by thread: Re: Submitted papers
Index(es):
- Date
- Thread