View Single Post
  #6 (permalink)  
Old 11-18-09, 11:17 PM
job0107's Avatar
job0107 job0107 is offline
Community Liaison
 
Join Date: Dec 2006
Location: Tacoma, Washington USA
Posts: 3,454
Thanks: 0
Thanked 140 Times in 137 Posts
Quote:
Originally Posted by vinpkl View Post
hi job0107

sorry if i was not able to clear the problem.

There are more than 2000 articles static html pages for which i need to extract data.

This is the sample code that i need to extract information from and add into database.

I need Title, Authorname, Article intro, articlebody as separate variables with information.

i hope this time everything will be understood.

vineet

Code:
<p class="title">article title</p>
<p class="authname">authorname</p>
<div class="articleintro">
<p>
article description come here article description come here article description come here article description come here article description come here article description come here article description come here article description come here 
</p>
</div>
<hr/>
<div class="articlebody">
<p>
article description come here article description come here article description come here article description come here article description come here article description come here article description come here article description come here 
</p>
<p>
article description come here article description come here article description come here article description come here article description come here article description come here article description come here article description come here 

</p>
<p>
article description come here article description come here article description come here article description come here article description come here article description come here article description come here article description come here 
</p>
<p>
article description come here article description come here article description come here article description come here article description come here article description come here article description come here article description come here 
</p>
</div>
Providing ALL your pages are laid out exactly as you have described above,
and also providing you do not have anything against using a little javascript to assist,
then getting the contents becomes rather easy.

Instead of using file_get_contents(), I decided it would be easier to include your page in an invisible div.
Then have a javascript function capture the contents of the included document and store the results in a form.

Once the results are in the form, the form can be manually or automatically submitted for processing.

This example requires you to manually submit the form.
Then the value of the forms hidden input elements are displayed.
You could just as easily send the values to a database.
PHP Code:

<html>
<head>
<style>
.intro{color:#0a0;font-size:20px;font-weight:bold;}
span{color:#00f;font-size:20px;font-weight:bold;}
</style>
<script>
function getDocumentParts() {
 var temp = new Array();
 var elms = document.getElementsByTagName("p");
 for(var i = 0; i < elms.length; i++)
 {
  if(elms[i].className == "title"){temp[0] = elms[i].innerHTML;}
  if(elms[i].className == "authname"){temp[1] = elms[i].innerHTML;}
  }
 var elms = document.getElementsByTagName("div");
 for(var i = 0; i < elms.length; i++)
 {
  if(elms[i].className == "articleintro"){temp[2] = elms[i].innerHTML;}
  if(elms[i].className == "articlebody"){temp[3] = elms[i].innerHTML;}
  }
 document.getElementById("title").value = temp[0] ? temp[0] : "empty";
 document.getElementById("authname").value = temp[1] ? temp[1] : "empty";
 document.getElementById("articleintro").value = temp[2] ? temp[2] : "empty";
 document.getElementById("articlebody").value = temp[3] ? temp[3] : "empty";
}
</script>
</head>
<body onload="getDocumentParts()">
<div style="position:absolute;visibility:hidden;">
<?php
$fileName 
"article_eng.html";
include 
$fileName;
?>
</div>
<form action="#" method="POST">
<input type="hidden" id="title" name="title">
<input type="hidden" id="authname" name="authname">
<input type="hidden" id="articleintro" name="articleintro">
<input type="hidden" id="articlebody" name="articlebody">
<input type="submit" name="go" value="Show Contents">
</form>
<?php
if(!empty($_POST["go"]))
{
 echo 
"<span class='intro'>Providing all your pages are laid out exactly the same, then this program will work every time.</span><br />
       <span class='intro'>Here I am just displaying the contents of the variables. They could just as easily be sent to a database.</span><p>
       <span>Title: </span>"
.htmlspecialchars($_POST["title"])."<p><span>Author Name: </span>".htmlspecialchars($_POST["authname"])."<p><span>Artical Intro: </span>".htmlspecialchars($_POST["articleintro"])."<p><span>Artical Body: </span>".htmlspecialchars($_POST["articlebody"]);
 }
?>
</body>
</html>
__________________
Jerry Broughton

Last edited by job0107; 11-18-09 at 11:51 PM.
Reply With Quote
The Following User Says Thank You to job0107 For This Useful Post:
vinpkl (11-19-09)