Current location: Hot Scripts Forums » Programming Languages » PHP » extracting price quantity pairs from HTML table


extracting price quantity pairs from HTML table

Reply
  #1 (permalink)  
Old 12-10-03, 06:09 PM
dewed dewed is offline
New Member
 
Join Date: Dec 2003
Posts: 4
Thanks: 0
Thanked 0 Times in 0 Posts
Question extracting price quantity pairs from HTML table

1st, yes I have permission to screenscrape this particular website, but they can't give me a DB text dump (its over 1GB) so .....

I need to extract some data from a simple HTML table, like this...

<TABLE>
<TR>
<TD>Quantity</TD>
<TD>144</TD>
<TD>288</TD>
</TR><TR>
<TD>Price</TD>
<TD>6.25</TD>
<TD>5.75</TD>
</TR></TABLE>

I've removed bgcolor & font tags for brevity. Luckily each table cell is on a new line like above.

The main problem is, there might be anywhere from 1 to 8 quantity/price pairs. I've got the whole page stored in an array, and can snag the 1st
pair like this...

$line = split("\n",$myLine);
$count = count($line);
while ($x<$count){
if (preg_match("/Quantity/", "$line[$x]")){$q1=$line[$x+1];}
if (preg_match("/Price/", "$line[$x]")){$p1=$line[$x+1];}
$x++;
}

Any ideas on how to get the rest?
Reply With Quote
  #2 (permalink)  
Old 12-11-03, 05:18 AM
fyrestrtr fyrestrtr is offline
Wannabe Coder
 
Join Date: Nov 2003
Posts: 191
Thanks: 0
Thanked 0 Times in 0 Posts
Try these sets of steps

1. Get the index of the Quantity line
2. for ($x = index of quantity line; $x != (next line that doesn't match Quantity) && $x!=(next line that matches Price); $x++)
3. In that for loop, grab all the quantities (stick them in an array)
4. Repeat the same for the Price line. (stick these in an array also)

Now you'll have two arrays that have the prices and quantity numbers. If you could have gotten this information as XML, it would have made this step real easy.
Reply With Quote
  #3 (permalink)  
Old 12-11-03, 11:38 AM
dewed dewed is offline
New Member
 
Join Date: Dec 2003
Posts: 4
Thanks: 0
Thanked 0 Times in 0 Posts
Quote:
Originally Posted by fyrestrtr
Try these sets of steps

1. Get the index of the Quantity line.
Not exactly sure what that means... but if it means "get the total number of pairs....
$pcount = preg_match_all( '/CCCCFF/', $myLine, $dummy );
the TDs with the quantities has that as a BG color and it isn't used elsewhere, and $pcount does indeed return the correct number of pairs.

Quote:
Originally Posted by fyrestrtr
2. for ($x = index of quantity line; $x != (next line that doesn't match Quantity) && $x!=(next line that matches Price); $x++)
I thought I could use variable variables for something like..

if (preg_match("/Quantity/", "$line[$x]")){
$p="1";
while ($p < $pcount){
if (preg_match("[0-9]+", "$line[$x]")){$q{"$p"}=$line[$x+$p]; $p++;}
}
}
I was hoping to generate vairables to store the quantities
$q1, $q2 etc...
and then the same routine to generate $p1, $p2 ... But I apparently don't understand variable variables

Quote:
Originally Posted by fyrestrtr
3. In that for loop, grab all the quantities (stick them in an array)
4. Repeat the same for the Price line. (stick these in an array also)

Now you'll have two arrays that have the prices and quantity numbers. If you could have gotten this information as XML, it would have made this step real easy.
Reply With Quote
  #4 (permalink)  
Old 12-11-03, 02:31 PM
dewed dewed is offline
New Member
 
Join Date: Dec 2003
Posts: 4
Thanks: 0
Thanked 0 Times in 0 Posts
Thumbs up got it!

Yeah for me!! I figured it out using the variable variable idea above. My main problem was $x++ was inside the if statement, resulting in an endless loop... here's the corrected version that works

if (preg_match("/Quantity/", "$line[$x]")){
$p="1";
while ($p < $pcount){
if (preg_match("/[0-9]+/", "$line[$x]")){
${"q$p"}=$line[$x+$p];}
$p++;
}
}
Reply With Quote
Reply

Bookmarks


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Forum Jump


All times are GMT -5. The time now is 07:31 AM.
vBulletin® Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.