[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Normalization



On Mon, Nov 11, 2002 at 01:20:11AM -0800, sara mraish wrote:
> Thanks a lot,
> 
> So to answer my questions you mentioned we can normalize an alef_maksura
> with a yeh and a teh_marbuta with a heh?

Yes. This is not something I came up with, it is what all the papers I have
seen on the subject say. If you like, I can email you privately a couple of
papers that you may find useful. 

> I see teh marbuta online as a teh marbuta I don't see the dots removed and
> it's a heh only if the word ends with a heh especially when we have a
> masclin word. I see teh marbuta as a teh marbuta.  I am just having a hard
> time convincing myself that we can remove the dots from the teh marbuta
> because if the word is feminin then it should have the two dots on top of
> the heh to make the word feminin..therefore, you are saying that we can
> remove the dots from the teh marbuta and make it a heh for indexing and
> usage in IR applications for Arabic language.
> 

Of course, but you have to consider two things at least:
  1. You don't want to limit your search too much
  2. The user is less likely to be keen on following such a rule (evident
     by our writing when we write on paper, often we omit the dots).
     
Then again, you make that choice yourself ;)
-- 
-------------------------------------------------------
| Mohammed Elzubeir    | Visit us at:                 |
|                      |  http://www.arabeyes.org/    |
| Arabeyes Project     | Homepage:                    |
| Unix the 'right' way |  http://fakkir.net/~elzubeir/|
-------------------------------------------------------
---
Was I helpful? Let others know:
http://svcs.affero.net/rm.php?r=elzubeir

Attachment: pgp00012.pgp
Description: PGP signature