[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: C++ Unicode for Arabic



On Wednesday 24 November 2004 09:37, Nadir Durrani wrote:
> >>Alsallam Alukum;
> >>
> >>How about for the output.  does it work or not for <fstream.h>.  Please
> >> advice.
>
> Oh yes it does ...  here is the code that writes arabic letter alif to
> file...
>
>
> fstream obj ("file.txt" , ios::out)
>
>
> int unicode =0x6;
>
>
> int unicode1=0x27   /* unicode and unicode1 combines to form alif because
> integer is of 2 bytes  in  Borland C with VC you can declare them together
> */
>
>
> obj<<(char)0xFF;
>
>
> obj<<(char)0xFE;
>
>
> obj<<(char)unicode1;
>
>
> obj<<(char)unicode;
>
>
> FFFE is for Unicode file and is stored in little endian format and then you
> can store Alef as 2706 and other characters similarly...
>
>

Please don't add the Byte Order Mark (BOM), it causes a lot of problems on
POSIX systems. (e.g. Perl scripts that doesn't work) and it makes the output
file not a standard UTF-8 file anymore (this has been forced by Microsoft,
that's the reason why NotePad can't make valid UTF-8 files).
FFFE and FFFF must not occur in a UTF-8 file and if found they should be
interpreted as a malformed sequence, not as a unicode file identifier.


See:
http://www.cl.cam.ac.uk/~mgk25/ucs/ISO-10646-UTF-8.html

and:
http://www.cl.cam.ac.uk/~mgk25/unicode.html

-- 
Mohammed Yousif
Egypt