Current location: Hot Scripts Forums » Programming Languages » PHP » How to make a list of links from HTML file?


How to make a list of links from HTML file?

Reply
  #1 (permalink)  
Old 04-17-05, 07:54 AM
hordubal hordubal is offline
New Member
 
Join Date: Apr 2005
Posts: 1
Thanks: 0
Thanked 0 Times in 0 Posts
Question How to make a list of links from HTML file?

I need to parse a HTML file, find all the links (HTML tag <a href=...>) and make list of all the links in the given HTML file.

I believe, this is a task for regular expression and function "preg_match_all", but I am not able to create an appropriate regular expression.

Anyone can help?

Petr
Reply With Quote
  #2 (permalink)  
Old 04-17-05, 08:52 AM
End User's Avatar
End User End User is offline
Level II Curmudgeon
 
Join Date: Dec 2004
Posts: 3,027
Thanks: 14
Thanked 35 Times in 33 Posts
Quote:
Originally Posted by hordubal
I need to parse a HTML file, find all the links (HTML tag <a href=...>) and make list of all the links in the given HTML file.

I believe, this is a task for regular expression and function "preg_match_all", but I am not able to create an appropriate regular expression.

Anyone can help?

Petr
Try:

preg_match_all("|<a>(.+)<\/a>|si", $data, $matches);

This will extract all the text between the "<a>" and "</a>" tags.....
__________________
I don't live on the edge, but sometimes I go there to visit.
-------------------------------------------------------------------------
Sanitize Your Data | Oracle Date & Substring Functions | Code Snippet Library | [url=http://www.codmb.com/Call Of Duty[/url]
Reply With Quote
  #3 (permalink)  
Old 04-17-05, 11:02 AM
moronovich moronovich is offline
Junior Code Guru
 
Join Date: Oct 2004
Posts: 460
Thanks: 0
Thanked 0 Times in 0 Posts
nice try but:
Code:
(*_*) php -r '$string = " < a href=\"link1.html\" > test < /a > < a href=\"link2.html\" > test2 < /a > "; preg_match_all ( "|<a>(.+)<\/a>|si", $data, $matches ); var_dump ( $matches );'
array(2) {
  [0]=>
  array(0) {
  }
  [1]=>
  array(0) {
  }
}
and
Code:
(*_*) php -r '$string = " <a> test </a> <a> test2 </a> "; preg_match_all ( "|<a>(.+)<\/a>|si", $data, $matches ); var_dump ( $matches );'
array(2) {
  [0]=>
  array(0) {
  }
  [1]=>
  array(0) {
  }
}
some simple regexp for limited test:
PHP Code:

$pattern '/<a(.*?)href=(["\']?)([^\s\'">]+?)(?(2)\2)(\s+)?>(.*?)<\/a>/si'
valid for:
PHP Code:

$string '<a href=link1.html>test</a><a href="link2.html">test2</a>';

preg_match_all($pattern,$string,$matches);
print_r($matches); 
failed for
PHP Code:

$string '<a href=link1.html onclick="return test();">test</a><a href="link2.html">test2</a>';

preg_match_all($pattern,$string,$matches);
print_r($matches); 
solution:
1. get <a whatever="...">whatever</a>
2. delete on(...) attribute
3. replace '\s+' with \s
4. get the link based on above pattern

good luck..
__________________
just an ignorant noob with moronic solution...
Reply With Quote
  #4 (permalink)  
Old 04-17-05, 11:05 AM
NeverMind's Avatar
NeverMind NeverMind is offline
Community VIP
 
Join Date: Aug 2003
Location: K.S.A
Posts: 2,257
Thanks: 0
Thanked 2 Times in 1 Post
Quote:
Originally Posted by End User
Try:

preg_match_all("|<a>(.+)<\/a>|si", $data, $matches);

This will extract all the text between the "<a>" and "</a>" tags.....
since we have <a href=... not <a>, your regexp wont work..
try this:
PHP Code:

preg_match_all('~<a href=(.+)>(.+)<\/a>~is'$data$matches); 

__________________
PHPSimplicity
We don't need a reason to help people - Zidane [FF9]
Reply With Quote
Reply

Bookmarks


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
html tutoral thefrtman HTML/XHTML/XML 5 04-27-09 10:25 AM
Upload file to table so ONLY files tied to primary key are displayed in record? grafixDummy PHP 4 12-20-03 04:28 PM


All times are GMT -5. The time now is 06:39 AM.
vBulletin® Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.