Quote:
Originally Posted by vinpkl
hi job0107
sorry if i was not able to clear the problem.
There are more than 2000 articles static html pages for which i need to extract data.
This is the sample code that i need to extract information from and add into database.
I need Title, Authorname, Article intro, articlebody as separate variables with information.
i hope this time everything will be understood.
vineet
Code:
<p class="title">article title</p>
<p class="authname">authorname</p>
<div class="articleintro">
<p>
article description come here article description come here article description come here article description come here article description come here article description come here article description come here article description come here
</p>
</div>
<hr/>
<div class="articlebody">
<p>
article description come here article description come here article description come here article description come here article description come here article description come here article description come here article description come here
</p>
<p>
article description come here article description come here article description come here article description come here article description come here article description come here article description come here article description come here
</p>
<p>
article description come here article description come here article description come here article description come here article description come here article description come here article description come here article description come here
</p>
<p>
article description come here article description come here article description come here article description come here article description come here article description come here article description come here article description come here
</p>
</div>
|
Providing
ALL your pages are laid out exactly as you have described above,
and also providing you do not have anything against using a little javascript to assist,
then getting the contents becomes rather easy.
Instead of using file_get_contents(), I decided it would be easier to include your page in an invisible div.
Then have a javascript function capture the contents of the included document and store the results in a form.
Once the results are in the form, the form can be manually or automatically submitted for processing.
This example requires you to manually submit the form.
Then the value of the forms hidden input elements are displayed.
You could just as easily send the values to a database.
PHP Code:
<html>
<head>
<style>
.intro{color:#0a0;font-size:20px;font-weight:bold;}
span{color:#00f;font-size:20px;font-weight:bold;}
</style>
<script>
function getDocumentParts() {
var temp = new Array();
var elms = document.getElementsByTagName("p");
for(var i = 0; i < elms.length; i++)
{
if(elms[i].className == "title"){temp[0] = elms[i].innerHTML;}
if(elms[i].className == "authname"){temp[1] = elms[i].innerHTML;}
}
var elms = document.getElementsByTagName("div");
for(var i = 0; i < elms.length; i++)
{
if(elms[i].className == "articleintro"){temp[2] = elms[i].innerHTML;}
if(elms[i].className == "articlebody"){temp[3] = elms[i].innerHTML;}
}
document.getElementById("title").value = temp[0] ? temp[0] : "empty";
document.getElementById("authname").value = temp[1] ? temp[1] : "empty";
document.getElementById("articleintro").value = temp[2] ? temp[2] : "empty";
document.getElementById("articlebody").value = temp[3] ? temp[3] : "empty";
}
</script>
</head>
<body onload="getDocumentParts()">
<div style="position:absolute;visibility:hidden;">
<?php
$fileName = "article_eng.html";
include $fileName;
?>
</div>
<form action="#" method="POST">
<input type="hidden" id="title" name="title">
<input type="hidden" id="authname" name="authname">
<input type="hidden" id="articleintro" name="articleintro">
<input type="hidden" id="articlebody" name="articlebody">
<input type="submit" name="go" value="Show Contents">
</form>
<?php
if(!empty($_POST["go"]))
{
echo "<span class='intro'>Providing all your pages are laid out exactly the same, then this program will work every time.</span><br />
<span class='intro'>Here I am just displaying the contents of the variables. They could just as easily be sent to a database.</span><p>
<span>Title: </span>".htmlspecialchars($_POST["title"])."<p><span>Author Name: </span>".htmlspecialchars($_POST["authname"])."<p><span>Artical Intro: </span>".htmlspecialchars($_POST["articleintro"])."<p><span>Artical Body: </span>".htmlspecialchars($_POST["articlebody"]);
}
?>
</body>
</html>