[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Internal number storage



On Fri, 25 Jan 2002, Nadim Shaikli wrote:

> Is this a stated "opinion" or fact ?  In other words, what does the
> world in large out there expect from an Arabic encoded document ?

Localized numerals (0660..0669, in your case). W3C standards mention that
specifically. There is a section in Unicode Bidi telling that applications 
"may" convert European numerals (0030..0039) to localized numerals 
intelligently, but does not specify any standard way for doing so.

Using European numerals is the Microsoft way, because they don't 
have the mechanisms to compute the numeric values of localized numerals 
everywhere.

> I can see two issue here,
> 
>  1. Person write a document in which both "hindi" and "non-hindi"
>     numbers are used, how is the distinction made (if all numbers are
>     stored as ASCII) ?  if its based on context is this procedure
>     formalized somehow (to get consistency among the various
>     applications) ?

You're right, and that's a major problem. It is happening a lot here in 
Iran, where both kind of numbers are used in documents. MS apps usually 
have a menu somewhere that's lets the user select from "always European", 
"always Arabic-Indic", and "automatic". The automatic method tries to be 
as intelligent as possible, but is very bad at that.

And no, it is not specified anywhere.

>  2. I'm very inclined to the think the answer to #1 above is that
>     its up-in-the-air.  Reason being, ISO-8859-6 includes the glyphs
>     to those "hindi" numbers and considers them proper "Arabic" numbers
>     (they are NOT in form-B for so-called "shaping").  Which would lead
>     one to believe those encodings ought be used instead of ASCII
>     (excuse the conjecture on my part :-)

If you want my word, forget about 8859-6 if you want consistency. There is
no gurantee that you can get Arabic-Indic or European numbers (whichever
you want) in a different environment.

BTW, I could not "get" your sentence completely.

> What the user sees is NOT a problem (except in the case where both
> number systems are intermixed in a document), so let's no get into
> preferences and what people expect to see.  We are strictly talking
> about what ought to be stored on disk.

European numbers if the user wants them, and Arabic numbers in the other
case. And if you are using a charset who has unified those, you can
either:

1) Convert them intelligently (however you define that).

2) Convert them to 0030..0039 (which helps you from getting into a can of 
worms, but makes the users complain).

> What does the unicode (or any other) standard say about which encodings
> should be stored (if they don't care and don't state anything, who does) ?

W3C cares, and Unicode cares. Both recommend storing the form that the 
user sees on the screen.

roozbeh