Thomas Milo wrote:
Meor's luadable effort has helped me to return to my original position: encode graphemes, not glyphs. Keep the tanween graphemically intact, this will improve searchability. So I recently changed my position regarding tanween according to the following formula, that I hope this community will endorse:
tanween = <vowel> <vowel> + [optional] <modifier>
<vowel>= fatha / dhamma / kasra <modifier>= tamweem / sequentializer
For backward compatibility,
<vowel> <vowel> = fathatan / dhammatan / kasratan
For example, using latin-1:
TANWEEN = ñ TANWEEN IDGHAM = Ñ TAMWEEM = %
Examples (x = kha, ç = sheen, ² = shadda):
kitaabuñ xuçubuÑ m²usan²ada#uÑ min% ba at d