[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Extractiing data with Perl



Salam,
I'm using Active State's Perl 5.8 on my Win2K AS machine.
I have a file that has a list of countries as an option list which I
want to extract, and put each option in a line and add a closing
</option> tag.
After trying for hours I got this script, which worked fine except that
it repeats any line that has an ending \n (newline).
---- Begin Perl script ----
#!/usr/bin/perl
while (<STDIN>) {
  while (m/(<option[\w\s=\"]*>[\w\s\d&#;]+)/gi){
    chomp();
    print "$1</option>\n";
  }
}---- End Perl script ----
I'm running the script as:
#perl extract.pl < countries.txt

Can any one please help, and also explain some things:
* How can I use \G (explained as: matches were the previous m//g left
off)?
* What is the difference between using chop() and chomp()?

I read through the manual but just couldn't figure it out.

Thanks in advance,
Ossama Khayat
<option value="AF" id="AF">Afghanistan
<option value="AL" id="AL">Albania
<option value="DZ" id="DZ">Algeria
<option value="AS" id="AS">American Samoa
<option value="AD" id="AD">Andorra
<option value="AO" id="AO">Angola
<option value="AI" id="AI">Anguilla
<option value="AQ" id="AQ">Antarctica
<option value="AG" id="AG">Antigua and Barbuda
<option value="AR" id="AR">Argentina
<option value="AM" id="AM">Armenia
<option value="AW" id="AW">Aruba
<option value="AU" id="AU">Australia
<option value="AT" id="AT">Austria<option value="AZ" id="AZ">Azerbaijan<option value="BS" id="BS">Bahamas<option value="BH" id="BH">Bahrain<option value="BD" id="BD">Bangladesh<option value="BB" id="BB">Barbados<option value="BY" id="BY">Belarus<option value="BE" id="BE">Belgium<option value="BZ" id="BZ">Belize<option value="BJ" id="BJ">Benin<option value="BM" id="BM">Bermuda<option value="BT" id="BT">Bhutan<option value="BO" id="BO">Bolivia<option value="BA" id="BA">Bosnia and Herzegovina<option value="BW" id="BW">Botswana<option value="BV" id="BV">Bouvet Island<option value="BR" id="BR">Brazil<option value="IO" id="IO">British Indian Ocean Territory<option value="BN" id="BN">Brunei<option value="BG" id="BG">Bulgaria
<option value="BF" id="BF">Burkina Faso<option value="BI" id="BI">Burundi<option value="KH" id="KH">Cambodia<option value="CM" id="CM">Cameroon<option value="CA" id="CA">Canada<option value="CV" id="CV">Cape Verde<option value="KY" id="KY">Cayman Islands<option value="CF" id="CF">Central African Republic<option value="TD" id="TD">Chad<option value="CL" id="CL">Chile<option value="CN" id="CN">China<option value="CX" id="CX">Christmas Island<option value="CC" id="CC">Cocos &#40;Keeling&#41; Islands<option value="CO" id="CO">Colombia<option value="KM" id="KM">Comoros<option value="CG" id="CG">Congo<option value="CK" id="CK">Cook Islands<option value="CR" id="CR">Costa Rica<option value="CI" id="CI">Côte d&#39;Ivoire<option value="HR" id="HR">Croatia &#40;Hrvatska&#41;<option value="CU" id="CU">Cuba<option value="CY" id="CY">Cyprus<option value="CZ" id="CZ">Czech Republic<option value="CD" id="CD">Congo &#40;DRC&#41;<option value="DK" id="DK">Denmark<option value="DJ" id="DJ">Djibouti<option value="DM" id="DM">Dominica
<option value="DO" id="DO">Dominican Republic<option value="TP" id="TP">East Timor<option value="EC" id="EC">Ecuador<option value="EG" id="EG">Egypt<option value="SV" id="SV">El Salvador<option value="GQ" id="GQ">Equatorial Guinea<option value="ER" id="ER">Eritrea<option value="EE" id="EE">Estonia<option value="ET" id="ET">Ethiopia<option value="FK" id="FK">Falkland Islands &#40;Islas Malvinas&#41;<option value="FO" id="FO">Faroe Islands<option value="FJ" id="FJ">Fiji Islands<option value="FI" id="FI">Finland<option value="FR" id="FR">France<option value="GF" id="GF">French Guiana<option value="PF" id="PF">French Polynesia<option value="TF" id="TF">French Southern and Antarctic Lands<option value="GA" id="GA">Gabon<option value="GM" id="GM">Gambia<option value="GE" id="GE">Georgia<option value="DE" id="DE">Germany<option value="GH" id="GH">Ghana<option value="GI" id="GI">Gibraltar<option value="GR" id="GR">Greece<option value="GL" id="GL">Greenland<option value="GD" id="GD">Grenada<option value="GP" id="GP">Guadeloupe<option value="GU" id="GU">Guam<option value="GT" id="GT">Guatemala<option value="GN" id="GN">Guinea<option value="GW" id="GW">Guinea-Bissau<option value="GY" id="GY">Guyana<option value="HT" id="HT">Haiti<option value="HM" id="HM">Heard Island and McDonald Islands<option value="HN" id="HN">Honduras<option value="HK" id="HK">Hong Kong SAR<option value="HU" id="HU">Hungary<option value="IS" id="IS">Iceland<option value="IN" id="IN">India<option value="ID" id="ID">Indonesia<option value="IR" id="IR">Iran<option value="IQ" id="IQ">Iraq<option value="IE" id="IE">Ireland<option value="IL" id="IL">Israel<option value="IT" id="IT">Italy<option value="JM" id="JM">Jamaica<option value="JP" id="JP">Japan<option value="JO" id="JO">Jordan<option value="KZ" id="KZ">Kazakhstan<option value="KE" id="KE">Kenya<option value="KI" id="KI">Kiribati<option value="KR" id="KR">Korea<option value="KW" id="KW">Kuwait<option value="KG" id="KG">Kyrgyzstan<option value="LA" id="LA">Laos<option value="LV" id="LV">Latvia<option value="LB" id="LB">Lebanon<option value="LS" id="LS">Lesotho<option value="LR" id="LR">Liberia<option value="LY" id="LY">Libya<option value="LI" id="LI">Liechtenstein<option value="LT" id="LT">Lithuania<option value="LU" id="LU">Luxembourg<option value="MO" id="MO">Macao SAR<option value="MK" id="MK">Macedonia, Former Yugoslav Republic of<option value="MG" id="MG">Madagascar<option value="MW" id="MW">Malawi<option value="MY" id="MY">Malaysia<option value="MV" id="MV">Maldives<option value="ML" id="ML">Mali<option value="MT" id="MT">Malta<option value="MH" id="MH">Marshall Islands<option value="MQ" id="MQ">Martinique<option value="MR" id="MR">Mauritania<option value="MU" id="MU">Mauritius<option value="YT" id="YT">Mayotte<option value="MX" id="MX">Mexico<option value="FM" id="FM">Micronesia<option value="MD" id="MD">Moldova<option value="MC" id="MC">Monaco<option value="MN" id="MN">Mongolia<option value="MS" id="MS">Montserrat<option value="MA" id="MA">Morocco<option value="MZ" id="MZ">Mozambique<option value="MM" id="MM">Myanmar<option value="NA" id="NA">Namibia<option value="NR" id="NR">Nauru<option value="NP" id="NP">Nepal<option value="NL" id="NL">Netherlands<option value="AN" id="AN">Netherlands Antilles<option value="NC" id="NC">New Caledonia<option value="NZ" id="NZ">New Zealand<option value="NI" id="NI">Nicaragua<option value="NE" id="NE">Niger<option value="NG" id="NG">Nigeria<option value="NU" id="NU">Niue<option value="NF" id="NF">Norfolk Island<option value="KP" id="KP">North Korea<option value="MP" id="MP">Northern Mariana Islands<option value="NO" id="NO">Norway<option value="OM" id="OM">Oman<option value="PK" id="PK">Pakistan<option value="PW" id="PW">Palau<option value="PA" id="PA">Panama<option value="PG" id="PG">Papua New Guinea<option value="PY" id="PY">Paraguay<option value="PE" id="PE">Peru<option value="PH" id="PH">Philippines<option value="PN" id="PN">Pitcairn Islands<option value="PL" id="PL">Poland<option value="PT" id="PT">Portugal<option value="PR" id="PR">Puerto Rico<option value="QA" id="QA">Qatar<option value="RE" id="RE">Reunion<option value="RO" id="RO">Romania<option value="RU" id="RU">Russia<option value="RW" id="RW">Rwanda<option value="WS" id="WS">Samoa<option value="SM" id="SM">San Marino<option value="ST" id="ST">São Tomé and Príncipe<option value="SA" id="SA">Saudi Arabia<option value="SN" id="SN">Senegal<option value="YU" id="YU">Serbia and Montenegro<option value="SC" id="SC">Seychelles<option value="SL" id="SL">Sierra Leone<option value="SG" id="SG">Singapore<option value="SK" id="SK">Slovakia<option value="SI" id="SI">Slovenia<option value="SB" id="SB">Solomon Islands<option value="SO" id="SO">Somalia<option value="ZA" id="ZA">South Africa<option value="GS" id="GS">South Georgia and the South Sandwich Islands<option value="ES" id="ES">Spain<option value="LK" id="LK">Sri Lanka<option value="SH" id="SH">St. Helena<option value="KN" id="KN">St. Kitts and Nevis<option value="LC" id="LC">St. Lucia<option value="PM" id="PM">St. Pierre and Miquelon<option value="VC" id="VC">St. Vincent and the Grenadines<option value="SD" id="SD">Sudan<option value="SR" id="SR">Suriname<option value="SJ" id="SJ">Svalbard and Jan Mayen<option value="SZ" id="SZ">Swaziland<option value="SE" id="SE">Sweden<option value="CH" id="CH">Switzerland<option value="SY" id="SY">Syria<option value="TW" id="TW">Taiwan<option value="TJ" id="TJ">Tajikistan<option value="TZ" id="TZ">Tanzania<option value="TH" id="TH">Thailand<option value="TG" id="TG">Togo<option value="TK" id="TK">Tokelau<option value="TO" id="TO">Tonga<option value="TT" id="TT">Trinidad and Tobago<option value="TN" id="TN">Tunisia<option value="TR" id="TR">Turkey<option value="TM" id="TM">Turkmenistan<option value="TC" id="TC">Turks and Caicos Islands<option value="TV" id="TV">Tuvalu<option value="UG" id="UG">Uganda<option value="UA" id="UA">Ukraine<option value="AE" id="AE">United Arab Emirates<option value="UK" id="UK">United Kingdom<option value="US" id="US" selected>United States<option value="UM" id="UM">United States Minor Outlying Islands<option value="UY" id="UY">Uruguay<option value="UZ" id="UZ">Uzbekistan<option value="VU" id="VU">Vanuatu<option value="VA" id="VA">Vatican City<option value="VE" id="VE">Venezuela<option value="VN" id="VN">Viet Nam<option value="VG" id="VG">Virgin Islands &#40;British&#41;<option value="VI" id="VI">Virgin Islands<option value="WF" id="WF">Wallis and Futuna<option value="YE" id="YE">Yemen<option value="ZM" id="ZM">Zambia<option value="ZW" id="ZW">Zimbabwe</select>