[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Extractiing data with Perl



Assalamo Alaykom Wa rahmato ALLAH wa barakatoh
Brother Ossama
First,I think i can help you in the perl script, please tell me the input and how would you like the output to be.
about your questions
1) about \G, as i read, it is used for closing characters. ex:
"(abc[def](ghi)jkl)" with the RE /\g(.*?)\g(.*?)\G(.*?)\G/
\g for "(" and \G for ")".
unless any one has other opinion.


2) the chop function removes the last character in the string ( it doesn't matter if it is digit, \n, or any character)and return it back, unlike the chomp function that removes only the \n from the word or the number.
ex:
if you 12345
chop chomp
12345 12345 #both of them removed the \n
1234 12345
123 12345
12 12345
1 12345
12345




Sincerely
Haitham Kaddoura


From: "Ossama Khayaat" <ossamak at nht dot com dot kw>
Reply-To: developer at arabeyes dot org
To: <developer at arabeyes dot org>
Subject: Extractiing data with Perl
Date: Tue, 29 Apr 2003 09:49:27 +0300
MIME-Version: 1.0
Received: from mc4-f33.law16.hotmail.com ([65.54.237.168]) by mc4-s1.law16.hotmail.com with Microsoft SMTPSVC(5.0.2195.5600); Mon, 28 Apr 2003 23:51:43 -0700
Received: from mail.arabeyes.org ([216.13.38.30]) by mc4-f33.law16.hotmail.com with Microsoft SMTPSVC(5.0.2195.5600); Mon, 28 Apr 2003 23:50:29 -0700
Received: from arabeyes.planux.com (localhost [127.0.0.1])by mail.arabeyes.org (Postfix) with ESMTPid 7D9A2315D8; Tue, 29 Apr 2003 02:42:04 -0400 (EDT)
Received: from STARGATE (unknown [168.187.157.10])by mail.arabeyes.org (Postfix) with SMTP id 133243146Bfor <developer at arabeyes dot org>; Tue, 29 Apr 2003 02:40:26 -0400 (EDT)
Received: from 170.7.0.153 by STARGATE (InterScan E-Mail VirusWall NT); Tue, 29 Apr 2003 09:53:20 +0300
X-Message-Info: jl7Vrt/mfsoDVNgfMplM1qI/2W6mkhNk
Delivered-To: developer at arabeyes dot org
content-class: urn:content-classes:message
X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0
Message-ID: <675BDF58DED1D747B3464B3118B735A62B2BB5 at BRAVEHEART dot kw>
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator: Thread-Topic: Extractiing data with Perl
Thread-Index: AcMOG80tFV5SfM8iTg2goLfTN7voew==
Sender: developer-admin at arabeyes dot org
Errors-To: developer-admin at arabeyes dot org
X-BeenThere: developer at arabeyes dot org
X-Mailman-Version: 2.0.13
Precedence: bulk
List-Unsubscribe: <http://lists.arabeyes.org/mailman/listinfo/developer>,<mailto:developer-request at arabeyes dot org?subject=unsubscribe>
List-Id: Development Discussions <developer.arabeyes.org>
List-Post: <mailto:developer at arabeyes dot org>
List-Help: <mailto:developer-request at arabeyes dot org?subject=help>
List-Subscribe: <http://lists.arabeyes.org/mailman/listinfo/developer>,<mailto:developer-request at arabeyes dot org?subject=subscribe>
List-Archive: <http://lists.arabeyes.org/archives/developer/>
Return-Path: developer-admin at arabeyes dot org
X-OriginalArrivalTime: 29 Apr 2003 06:50:30.0749 (UTC) FILETIME=[9FFBC8D0:01C30E1B]


Salam,
I'm using Active State's Perl 5.8 on my Win2K AS machine.
I have a file that has a list of countries as an option list which I
want to extract, and put each option in a line and add a closing
</option> tag.
After trying for hours I got this script, which worked fine except that
it repeats any line that has an ending \n (newline).
---- Begin Perl script ----
#!/usr/bin/perl
while (<STDIN>) {
  while (m/(<option[\w\s=\"]*>[\w\s\d&#;]+)/gi){
    chomp();
    print "$1</option>\n";
  }
}---- End Perl script ----
I'm running the script as:
#perl extract.pl < countries.txt

Can any one please help, and also explain some things:
* How can I use \G (explained as: matches were the previous m//g left
off)?
* What is the difference between using chop() and chomp()?

I read through the manual but just couldn't figure it out.

Thanks in advance,
Ossama Khayat
<< countries.txt >>


_________________________________________________________________
STOP MORE SPAM with the new MSN 8 and get 2 months FREE* http://join.msn.com/?page=features/junkmail