On Mon, Nov 11, 2002 at 12:11:03AM -0800, sara mraish wrote: > Salam, > > My question is how can I normalize teh marbuta to replace the heh? Also, how > can I normalize the alif-maksura to yeh? So that I can use it in the > information retrieval when searching for a string. Arabic words must be > normalized before the text is ready for indexing, keyword searches, or text > manuipulation. Salam, You simply replace the characters ;) If you would give more details then perhaps someone can help you, but as it is, I cannot understand what you are asking (and I doubt others can). Assuming that you have a text file you want to normalize, you would need to: - Remove punctuation - Remove diacritics - Remove non-Arabic letters - Replace any ALEF with a HAMZA or MADDA with a plain ALEF (U+0627) - Replace any YEH followed by a HAMZA by itself with a YEH with a HAMZA on top (U+0626) - Replace any ALEF_MAKSURA with a YEH (U+064A) - Replace any TEH_MARBUTA with a HEH (U+0647) How you do that is of course with whatever language you choose. But without more information in _details_, there is only so much I can help with. P.S. When replying to an email from a digest, please remove the unrelated mail from the reply. This would save a lot of people bandwidth downloading their mail. later -- ------------------------------------------------------- | Mohammed Elzubeir | Visit us at: | | | http://www.arabeyes.org/ | | Arabeyes Project | Homepage: | | Unix the 'right' way | http://fakkir.net/~elzubeir/| ------------------------------------------------------- --- Was I helpful? Let others know: http://svcs.affero.net/rm.php?r=elzubeir
Attachment:
pgp00011.pgp
Description: PGP signature