[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Submitted papers

To: general at arabeyes dot org
Subject: Re: Submitted papers
From: "Chahine M. Hamila" <mch at chaham dot com>
Date: Mon, 20 Aug 2001 16:22:18 +0200

Behdad Esfahbod wrote:

> On Mon, 20 Aug 2001, Chahine M. Hamila wrote:
>
> but UCS-2 does not encode characters above BMP!

Okay, then consider it about UCS-4 only.

> > UTF-8 is good for storage or data exchange, but it
> > multiplies complexity of many basic string functions by n.
>
> Which ones? any examples?

most basic example is access to nth character. O(1) with UCS-4 (leaving
UCS-2 aside since it seems to have problems...), and O(n) with UTF-8.

> I use UTF-32 internally and UTF-8 for transfers too,

glad we agree on something;) (kidding:))

> but just convert
> from UTF-8 to UTF-32 that I really need, example: in my implementation
> of the filter that performs bidi algorithm on console, first I check
> for any non-ascii character in UTF-8 input, and if I find one, then
> convert it to UTF-32, apply bidi, and convert the output to UTF-8
> again...

In case UTF-8 "is purely ASCII", the problem I mentionned doesn't exist
indeed. That said, that scan you do depends on what program you're
writing. But this is out of the scope of this list...

Salam

References:
- Re: Submitted papers
  - From: Behdad Esfahbod

Prev by Date: new arabeyes.org look
Next by Date: FWD: open source workshop
Previous by thread: Re: Submitted papers
Next by thread: Banner/Logo Explained
Index(es):
- Date
- Thread