[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Submitted papers
- To: general at arabeyes dot org
- Subject: Re: Submitted papers
- From: "Chahine M. Hamila" <mch at chaham dot com>
- Date: Mon, 20 Aug 2001 16:22:18 +0200
Behdad Esfahbod wrote:
> On Mon, 20 Aug 2001, Chahine M. Hamila wrote:
>
> but UCS-2 does not encode characters above BMP!
Okay, then consider it about UCS-4 only.
> > UTF-8 is good for storage or data exchange, but it
> > multiplies complexity of many basic string functions by n.
>
> Which ones? any examples?
most basic example is access to nth character. O(1) with UCS-4 (leaving
UCS-2 aside since it seems to have problems...), and O(n) with UTF-8.
> I use UTF-32 internally and UTF-8 for transfers too,
glad we agree on something;) (kidding:))
> but just convert
> from UTF-8 to UTF-32 that I really need, example: in my implementation
> of the filter that performs bidi algorithm on console, first I check
> for any non-ascii character in UTF-8 input, and if I find one, then
> convert it to UTF-32, apply bidi, and convert the output to UTF-8
> again...
In case UTF-8 "is purely ASCII", the problem I mentionned doesn't exist
indeed. That said, that scan you do depends on what program you're
writing. But this is out of the scope of this list...
Salam