Current location: Hot Scripts Forums » Programming Languages » PHP » How can I search for a strings in HTML code?


How can I search for a strings in HTML code?

Reply
  #1 (permalink)  
Old 02-20-08, 03:08 AM
INTEL INTEL is offline
Newbie Coder
 
Join Date: Mar 2005
Posts: 60
Thanks: 0
Thanked 0 Times in 0 Posts
How can I search for a strings in HTML code?

Hello and thank you for reading this.

I'm using PHP Snoopy class to obtain HTML of a webpage.

Say I have

HTML Code:
<html>
<head>
<title>Hello</title>
</head>
<body>
My Name: Alex
</body>
</html>
If thats the data I have, how can I extract "Hello" and then "Alex"? I looked at split and explode, I dont think they would quit work.

Any feedback is welcome.

Thank you.
Reply With Quote
  #2 (permalink)  
Old 02-20-08, 03:11 AM
Nico's Avatar
Nico Nico is offline
Community Leader
 
Join Date: Sep 2005
Location: Spain
Posts: 8,075
Thanks: 11
Thanked 88 Times in 83 Posts
Can the content vary? If so, in which way?
Reply With Quote
  #3 (permalink)  
Old 02-20-08, 03:17 AM
INTEL INTEL is offline
Newbie Coder
 
Join Date: Mar 2005
Posts: 60
Thanks: 0
Thanked 0 Times in 0 Posts
Hi Nico,

Thanks for your speedy response. Yes, the content will vary slightly. Mostly the values I will be extracting.

It would be ideal if I could say something like

START AFTER <title>
STOP BEFORE </title>

and have the value in between returned to me "Hello" in above case. So that if the rest of the code changes slightly changes on the page (say there is a dynamic banner), it would still work.

Thank you.
Reply With Quote
  #4 (permalink)  
Old 02-20-08, 03:23 AM
Jay6390's Avatar
Jay6390 Jay6390 is offline
Code Master
 
Join Date: Apr 2007
Location: United Kingdom
Posts: 1,330
Thanks: 0
Thanked 0 Times in 0 Posts
I think what you are after is a regex for that. something like
PHP Code:

$pattern='/<title>([^<]+?)<\/title>/';
preg_match($pattern,$pagedata,$matches);
echo 
$matches[1]; 
Jay
__________________
Useful Tutorials
[ PHP Video-1-2-3 ] [ MySQL 1-2-3 ]
For any php function reference type

www.php.net/FunctionName

Last edited by Jay6390; 02-20-08 at 03:26 AM. Reason: Corrected final line
Reply With Quote
  #5 (permalink)  
Old 02-20-08, 03:25 AM
Nico's Avatar
Nico Nico is offline
Community Leader
 
Join Date: Sep 2005
Location: Spain
Posts: 8,075
Thanks: 11
Thanked 88 Times in 83 Posts
EDIT: Jay beat me to it.

Try this:

PHP Code:


$content 
/* HTML content */;

if (
preg_match('~<title>(.*?)</title>~si'$content$title))
{
    echo 
"Title: {$title[1]}";
}

if (
preg_match('~<body[^>]*>(.*?)</body>~si'$content$body))
{
    echo 
"Body: {$body[1]}";

Reply With Quote
  #6 (permalink)  
Old 02-20-08, 10:07 PM
INTEL INTEL is offline
Newbie Coder
 
Join Date: Mar 2005
Posts: 60
Thanks: 0
Thanked 0 Times in 0 Posts
Thank you both.

I tried the code and neither returned any values.

Please see:
http://projectserver10.info/parser/crawler/

Code:
PHP Code:



<?php

include("Snoopy.class.php");

$snoopy = new Snoopy;

// need an proxy?:
//$snoopy->proxy_host = "my.proxy.host";
//$snoopy->proxy_port = "8080";

// set browser and referer:
$snoopy->agent "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)";
$snoopy->referer "http://www.jonasjohn.de/";

// set some cookies:
$snoopy->cookies["SessionID"] = '238472834723489';
$snoopy->cookies["favoriteColor"] = "blue";

// set an raw-header:
$snoopy->rawheaders["Pragma"] = "no-cache";

// set some internal variables:
$snoopy->maxredirs 2;
$snoopy->offsiteok false;
$snoopy->expandlinks false;

// set username and password (optional)
//$snoopy->user = "joe";
//$snoopy->pass = "bloe";

// fetch the text of the website www.google.com:
if($snoopy->fetch("http://www.Google.com")){ 
    
// other methods: fetch, fetchform, fetchlinks, submittext and submitlinks
    
    // response code:
    
print "response code: ".$snoopy->response_code."<br/>\n";
    
    
// print the headers:
    
    
print "<b>Headers:</b><br/>";
    while(list(
$key,$val) = each($snoopy->headers)){
        print 
$key.": ".$val."<br/>\n";
    }
    
    print 
"<br/>\n";
    
    
// print the texts of the website:
    
    
$content htmlspecialchars($snoopy->results);
    
    echo 
"<pre> $content </pre>\n";
    
    
echo(
"--- CRAWLED INFO Method 1 --");
 
$pattern='/<title>([^<]+?)<\/title>/';
preg_match($pattern,$content,$matches);
echo 
$matches[1];  
     
echo(
"<br><br>--- CRAWLED INFO Method 2 --");

if (
preg_match('~<title>(.*?)</title>~si'$content$title))
{
    echo 
"Title: {$title[1]}";
}
     
}
else {
    print 
"error while fetching document: ".$snoopy->error."\n";
}




?>
Reply With Quote
  #7 (permalink)  
Old 02-20-08, 11:30 PM
INTEL INTEL is offline
Newbie Coder
 
Join Date: Mar 2005
Posts: 60
Thanks: 0
Thanked 0 Times in 0 Posts
Figured it out. I had to put " " around my variable. Thank you very much to both of you!!
Reply With Quote
Reply

Bookmarks


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
looking for random code generation script for html or php form sujata_ghosh Script Requests 1 03-22-07 03:05 AM
php code for convert html output to pdf vanisridesu PHP 3 01-16-07 04:57 AM
SEO Expert Available nakulgoyal Job Offers & Assistance 2 08-14-04 12:38 PM
Declared Functions skipper23 PHP 4 12-17-03 10:06 AM
index page not showing up skipper23 PHP 3 12-15-03 01:10 PM


All times are GMT -5. The time now is 07:37 AM.
vBulletin® Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.