On Mon, Nov 11, 2002 at 12:11:03AM -0800, sara mraish wrote:
> Salam,
>
> My question is how can I normalize teh marbuta to replace the heh? Also, how
> can I normalize the alif-maksura to yeh? So that I can use it in the
> information retrieval when searching for a string. Arabic words must be
> normalized before the text is ready for indexing, keyword searches, or text
> manuipulation.
Salam,
You simply replace the characters ;) If you would give more details then
perhaps someone can help you, but as it is, I cannot understand what you are
asking (and I doubt others can).
Assuming that you have a text file you want to normalize, you would need to:
- Remove punctuation
- Remove diacritics
- Remove non-Arabic letters
- Replace any ALEF with a HAMZA or MADDA with a plain ALEF (U+0627)
- Replace any YEH followed by a HAMZA by itself with a YEH with a HAMZA on
top (U+0626)
- Replace any ALEF_MAKSURA with a YEH (U+064A)
- Replace any TEH_MARBUTA with a HEH (U+0647)
How you do that is of course with whatever language you choose. But without
more information in _details_, there is only so much I can help with.
P.S. When replying to an email from a digest, please remove the unrelated
mail from the reply. This would save a lot of people bandwidth
downloading their mail.
later
--
-------------------------------------------------------
| Mohammed Elzubeir | Visit us at: |
| | http://www.arabeyes.org/ |
| Arabeyes Project | Homepage: |
| Unix the 'right' way | http://fakkir.net/~elzubeir/|
-------------------------------------------------------
---
Was I helpful? Let others know:
http://svcs.affero.net/rm.php?r=elzubeir
Attachment:
pgp00011.pgp
Description: PGP signature