[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: TashkeelHandler: A QT C++ Class



Hello Abdalla,

Is it possible to use this new "Removing all diacritical marks from an Arabic string" feature in TashkilHandler to enable an option in QT to display Quran text without diacritical marks? I know some people who prefer to read the Quran without diacritical marks. Also I think more options that enable the user to select the level of diacritical marks to use would be neat too (for example possible options could be: "Don't display tajweed marks", "Don't display tajweed marks and vowel marks", "Don't display any marks").

Regards,
Mete

---------- Original Message ----------------------------------
From: Abdalla Alothman <abdalla at pheye dot net>
Reply-To: abdalla at pheye dot net,Development Discussions <developer at arabeyes dot org>
Date:  Wed, 31 Aug 2005 22:12:44 +0300

>Asalamu alaikum.
>
>I am sending a small QT-based C++ class to handle diacritical marks.
>What's demonstrated is the following:
>
>1. Removing all diacritical marks from an Arabic string.
>2. Searching diacritically marked text with regular expressions.
>3. Proof that QT is not just a GUI library.
>
>in #2, we follow a simple algorithm:
>
>* break the text into its characters.
>
>* add the regular expression, rule1, after every character.
>
>* join the new characters into a new string.
>
>With #2, there would be no need to strip the diacritical marks from any
>diacritically marked text, which eliminates the necessity to provide a
>duplicated content to be the searching content while the marked text
>becomes only the visible content.
>
>A driver is included to test the class. To use it, your console must support
>bidirectional text. If you are using a recent version of KDE, you can instruct
>Konsole (the KDE console application) to use bidi in the settings dialog.
>
>The driver performs the following:
>
>1. After a search string is entered, a new TashkeelHandler is instantiated.
>2. The TashkeelHandler instance constructs the new regex.
>3. A search is made to see if the regex is in the current line.
>4. If a match is found, the line containing the match is displayed and then
>   saved into the file: searchresults.txt.
>5. The instance then removes the marks from the match and saves it in the
>   same file.
>6. The searched content are the ayat in surat alqamar. This file is attached, don't
>   forget to save it in the same directory along with the sources.
>
>SAMPLE RUN (You can enter more than one word like: kathabat thamuud):
>
>~/Projects/arabic/toys/tashkeelhandler #-> tdriver
>
>Enter Search String: دسر
>Found match in aya: 13
>وحملناه على ذات ألواح ودسر
>~/Projects/arabic/toys/tashkeelhandler #->
>
>
>You can use KWord to view the diffrences between both lines.
>
>To compile:
>
>g++ tashkeelhandler.cc tdriver.cc -o tdriver -lqt
>
>Of course your QT library shouldn't be outdated.
>
>Sorry if this message is too long.
>
>Salam,
>Abdalla Alothman
>
>/////////////////////////////////////////////////////////////////////////
>// FILE: tashkeelhandler.h
>// Interface for class TashkeelHandler
>// 1. Strip Tashkeel from a string
>// 2. Construct Regular Expressions to match diacritically marked content
>// By Abdalla Alothman - abdalla at pheye dot net - June 2003
>// Updated January 12 2004
>// Updated April 07 2004
>/////////////////////////////////////////////////////////////////////////
>
>#ifndef __TASHKEELHANDLER_H__
>#define __TASHKEELHANDLER_H__
>
>#include <qstring.h>
>#include <qstringlist.h>
>#include <qregexp.h>
>#include <iostream>
>
>namespace trule1
>{
>  class TashkeelHandler
>  {
>    public:
>      TashkeelHandler();
>      ~TashkeelHandler() { delete rule1; }
>      void removeTashkeel(QString&, QString&);
>      QRegExp constructRegex(QString&);
>      void findInString(QString&, QRegExp&);
>
>    private:
>      QString singleWordConstructor(QString&);
>      QString multipleWordConstructor(QString&);
>
>      QString tashkeelstr;
>      QRegExp *rule1;
>      //QRegExp rule1(QString::fromUtf8("([ًٌٍَُِّْ]*)"));
>  };
>}
>#endif
>
>/////////////////////////////////////////////////////////////////////////
>// FILE: tashkeelhandler.h
>// Implementation for class TashkeelHandler
>// 1. constructRegex: Build a regular expression
>//   [a] PRIVATE: singleWordConstructor: Builds regex for single words
>//   [b] PRIVATE: multipleWordConstructor: Builds regex for multiple
>//       words
>// 2. removeTashkeel: Removes tashkeel from a string.
>//
>// By Abdalla Alothman - abdalla at pheye dot net - June 2003
>/////////////////////////////////////////////////////////////////////////
>#include "tashkeelhandler.h"
>namespace trule1
>{
>  TashkeelHandler::TashkeelHandler()
>  {
>    tashkeelstr = QString::fromUtf8("([ًٌٍَُِّْ]){0,2}");
>    rule1 = new QRegExp(tashkeelstr);
>  }
>
>  void TashkeelHandler::removeTashkeel(QString &in, QString &r1)
>  {
>    QString rule1;
>    if(r1.isEmpty())
>    {
>      rule1 = tashkeelstr.utf8();
>    }
>    else rule1 = r1;
>
>    QRegExp r2(QString::fromUtf8(rule1));
>
>    if( in.contains(r2) )
>    {
>      in = in.remove(r2);
>    }
>  }
>
>  QRegExp TashkeelHandler::constructRegex(QString &in)
>  {
>
>    if( in.contains( QRegExp("(\\s)") ) )
>    {
>      in = multipleWordConstructor(in);
>    }
>
>    else
>    {
>      in = singleWordConstructor(in);
>    }
>    return in;
>  }
>
>  QString TashkeelHandler::multipleWordConstructor(QString &in)
>  {
>    // First: Check if input string contains any tashkeel
>    if( in.contains( *rule1 ) )
>    {
>      QString d1 = rule1->pattern();
>      removeTashkeel(in, d1);
>    }
>
>    QStringList inList = QStringList::split(" ", QString::fromUtf8(in));
>    QStringList outList;
>    QString inTemp("");
>
>    for(QStringList::iterator i = inList.begin(); i != inList.end(); ++i)
>    {
>      inTemp = *i;
>
>      QStringList tList = QStringList::split("", inTemp);
>      inTemp = tList.join(tashkeelstr) + tashkeelstr;
>      outList << inTemp;
>    }
>    inTemp = outList.join(" ");
>    inTemp.prepend("(^|\\s)");
>    inTemp.prepend(tashkeelstr);
>    inTemp.append("($|\\s)");
>    return inTemp;
>  }
>
>  QString TashkeelHandler::singleWordConstructor(QString& in)
>  {
>    QStringList inList = QStringList::split("", QString::fromUtf8(in));
>    in = inList.join(tashkeelstr);
>    return in;
>  }
>}
>
>///////////////////////////////////////////////////////////////////
>// FILE: tdriver.cc
>// Test drives class TashkeelHandler
>// By Abdalla Alothman - abdalla at pheye dot net - August 31, 2005
>//
>// compile with: g++ tashkeelhandler.cc tdriver.cc -o tdriver -lqt
>///////////////////////////////////////////////////////////////////
>
>#include "tashkeelhandler.h"
>#include <qfile.h>
>#include <iostream>
>#include <string>
>#include <memory>
>#include <qtextstream.h>
>// Class declared in namespace trule1, so use using
>// qualify the instance(s)
>using namespace trule1;
>using namespace std;
>
>int main()
>{
>  cout << "Enter Search String: ";
>  string input;
>  getline(cin, input);
>  QString i2(input);
>  QString line("");
>  QFile f1("054-alqamar-utf.txt"); // THIS FILE IS AN ATTACHMENT
>  QFile f2("searchresults.txt");
>  QTextStream tstream2(&f2);
>  tstream2.setEncoding(QTextStream::UnicodeUTF8);
>  QRegExp r;
>  if(f1.open(IO_ReadOnly) && f2.open(IO_WriteOnly))
>  {
>    auto_ptr<TashkeelHandler> h1(new TashkeelHandler);
>    // this will construct a regex
>    r = h1->constructRegex(i2);
>    QTextStream tstream(&f1);
>    int i = -1; // to show aya number, but don't count the title and the Basmala
>
>    while( !tstream.eof() )
>    {
>      line = tstream.readLine();
>
>      // search the input line using the constructed regex
>      if(line.contains(r))
>      {
>        cout << "Found match in aya: "
>             << i
>             << endl
>             << line.utf8()
>             << endl;
>        tstream2 << "With Tashkeel: " << endl << line << endl;
>        QString a("");
>        // Remove all tashkeel
>        // "a" is an empty tashkeel string. say you only want to remove
>        // the dhamma then let a = "dhamma" and send a, nothing will be
>        // remove except the dhamma.
>        // Do not abuse!!
>        h1->removeTashkeel(line, a);
>        tstream2 << "Tashkeel removed: "
>                 << endl
>                 << line
>                 << endl;
>      }
>      i++;
>      continue;
>    }
>  }
>  return 0;
>}
>
>

--
Mete Kural
Touchtone Corporation
714-755-2810
--