[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: X Keyboard for Kurdish with Arabic script



Thanks Oibane,

your answers were very useful for me.

As there is no easy solution for the keyboard, I will focus on the Unicode problem first.

The other problem, which codepoint should be used, should also be
clarified, but please wait for the words of experts. There're quite a
few in this mailing list. I'm not the one, so I just remain
to review the facts.

A clarification is clearly needed. The reason why I care about this at all is that I am also working on a spellchecker. For a spellchecker it is essential to know how the letters are encoded.

Backgrounds:
U+06D5 Arabic letter AE has been there from the very beginning of the
Unicode version 1. From the ver 3.2, it is defined to be "teh marbuta"
class in its joining-group propery, and corrected to be
rightward-joining. (It is strange that only few fonts support it,
if it's such a veteran.)

The reason is that it is not used in the "major" languages like Arabic and Farsi, but in "smaller" languages like Uigur. For Kurdish the 647-workaround exists, so one can argue that is not neede for Kurdish either. I have to check back why it is mentioned for Kurdish, since I couldn't find a hint in the Unicode documents.

What makes the story complicated is the U+200C Zero-width-non-joiner
(ZWNJ). It first appeared in the ver 4.0. It's rather new.
(To note, U+200D Zero-width-joiner ZWJ is from 3.2.0. They were
not classmates.) The consistency of the policy lacks between 6d5
and 647+200c.

I'd rather do completely without the ZWNJ. It just complicates the issue, as you say.

It is a problem, absolutelly, to have twin-in-visual. If they
are not distingushable in their appearance, then confusions happen.

Definitely. Unfortunately there are several variants that will look exacly the same, not just two. Another letter, "k" is also affected by this.

At least, the policy needs to be established. Just sitting back
does not solve the problem.
a. If U+06D5 is to stay, then equivalence with 647+200c has to be defined.
It better be annotated to be "Kurdish" to reduce possible confusion,
although the annotation does not seem to be mandatory.
b. It can be deprecated.

I am not sure what U+06D5 in fact IS. I know neither Uigur nor Kazakh. But a) looks better than b) to me :)

Where can we apply for that?

Regards,
Erdal