[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Submitted papers



Behdad Esfahbod wrote:

> On Mon, 20 Aug 2001, Chahine M. Hamila wrote:
>
> but UCS-2 does not encode characters above BMP!

Okay, then consider it about UCS-4 only.

> > UTF-8 is good for storage or data exchange, but it
> > multiplies complexity of many basic string functions by n.
>
> Which ones? any examples?

most basic example is access to nth character. O(1) with UCS-4 (leaving
UCS-2 aside since it seems to have problems...), and O(n) with UTF-8.

> I use UTF-32 internally and UTF-8 for transfers too,

glad we agree on something;) (kidding:))

> but just convert
> from UTF-8 to UTF-32 that I really need, example: in my implementation
> of the filter that performs bidi algorithm on console, first I check
> for any non-ascii character in UTF-8 input, and if I find one, then
> convert it to UTF-32, apply bidi, and convert the output to UTF-8
> again...

In case UTF-8 "is purely ASCII", the problem I mentionned doesn't exist
indeed. That said, that scan you do depends on what program you're
writing. But this is out of the scope of this list...

Salam