I found this auto linking script that works very nicely. So far I have made a few changes to it. It now automatically pulls the URL to reference to an HTML file that pulls the content, and it now also will filter out links to the same page from a site wide list of links. However I have one more challenge to tackle on this script before it is satisfactory. The script automatically links chosen keywords, thats great, but it links every iteration of the chosen keywords on the page. For example, if I am trying to link the keyword "fish food" to a fish food page, every iteration of the phrase "fish food" will be linked to that page. I would like to only have it link to the 1st iteration of each chosen keyword. If I'm talking about fish food and there's 20 links on the page with that anchor text it begins to look ridiculous. The 1st few tweaks we're easy, but this one I just don't have a clue on. Take a look, here's the scripts:
Here we have the main page that pulls the content from an html file:
/**********************************************
This is the automatice keyword generator class
***********************************************/
//this the actual application.
include('autolink/class.autokeyword.php');
$params['content'] = $data; //page content
//set the length of keywords you like
$params['min_word_length'] = 5; //minimum length of single words
$params['min_word_occur'] = 5; //minimum occur of single words
$params['min_2words_length'] = 3; //minimum length of words for 2 word phrases
$params['min_2words_phrase_length'] = 10; //minimum length of 2 word phrases
$params['min_2words_phrase_occur'] = 2; //minimum occur of 2 words phrase
$params['min_3words_length'] = 3; //minimum length of words for 3 word phrases
$params['min_3words_phrase_length'] = 10; //minimum length of 3 word phrases
$params['min_3words_phrase_occur'] = 2; //minimum occur of 3 words phrase
$keyword = new autokeyword($params, "iso-8859-1");
$keywords = $keyword->get_keywords();
$origin = $_SERVER["REQUEST_URI"];
/**********************************************
This is the start of the auto link keyword!
***********************************************/
// this list could be an output of a database query
// or just from a plain text file.
$linkfile ='autolink/linkedKeywords.php';
//read the file
$fh = fopen($linkfile,'r') or die("can't read ".$linkfile." file!");
$keyword_array = array();
while (! feof($fh)) {
$s = rtrim(fgets($fh,1024));
list($word,$link) = explode(',',$s);
$word = trim($word);
$link = trim($link);
if ($link != $origin){
$keyword_array[$word] = $link;
}
}
fclose($fh) or die("can't close file ".$linkfile."!");
include('autolink/class.autolink.php');
$autolink = new autolink($keywords, $keyword_array, $data, "link", "link","i");
echo $autolink->linkKeywords();
As you can see the params are used to filter the keywords used. They have no impact on the number of link occurrences. They simply filter out a keyword that has less than the set parameters.
Next we have the class.autokeyword.php file:
PHP Code:
class autokeyword {
//declare variables
//the site contents
var $contents;
var $encoding;
//the generated keywords
var $keywords;
//minimum word length for inclusion into the single word
//metakeys
var $wordLengthMin;
var $wordOccuredMin;
//minimum word length for inclusion into the 2 word
//phrase metakeys
var $word2WordPhraseLengthMin;
var $phrase2WordLengthMinOccur;
//minimum word length for inclusion into the 3 word
//phrase metakeys
var $word3WordPhraseLengthMin;
//minimum phrase length for inclusion into the 2 word
//phrase metakeys
var $phrase2WordLengthMin;
var $phrase3WordLengthMinOccur;
//minimum phrase length for inclusion into the 3 word
//phrase metakeys
var $phrase3WordLengthMin;
//turn the site contents into an array
//then replace common html tags.
function replace_chars($content)
{
//convert all characters to lower case
$content = mb_strtolower($content);
//$content = mb_strtolower($content, "UTF-8");
$content = strip_tags($content);
//count the 2 word phrases
$y = array_count_values($y);
$occur_filtered = $this->occure_filter($y, $this->phrase2WordLengthMinOccur);
//sort the words from highest count to the lowest.
arsort($occur_filtered);
//count the 3 word phrases
$b = array_count_values($b);
//sort the words from
//highest count to the
//lowest.
$occur_filtered = $this->occure_filter($b, $this->phrase3WordLengthMinOccur);
arsort($occur_filtered);
function occure_filter($array_count_values, $min_occur)
{
$occur_filtered = array();
foreach ($array_count_values as $word => $occured) {
if ($occured >= $min_occur) {
$occur_filtered[$word] = $occured;
}
}
return $occur_filtered;
}
function implode($gule, $array)
{
$c = "";
foreach($array as $key=>$val) {
@$c .= $key.$gule;
}
return $c;
}
}
Of course there is a linkedKeywords file but that's irrelevant here.
I'm pretty sure what I need to tweak is the class.autolink script. So here it is class.autolink.php:
PHP Code:
class autolink {
// initialize class
// $keywords : keywords list output from automatic keyword class
//$linksArray : list of predetmined links related to keywords
//$contents : the article contents
function autolink($keywords, $linksArray, $content, $id = NULL, $class = NULL, $type = NULL ){
$this->links = $linksArray;
//get the links keys
$this->links_keys = array_keys($this->links);
//convert the keyword list into an array
$this->keywords = split(",",$keywords);
$this->content = $content;
//CSS formatting
$this->id = $id;
$this->class = $class;
//replacement type
//CASE SENSITIVE : $type = NULL
//CASE INSENSITIVE : $type = "i";
$this->type = $type;
}
// link the keyword if it is contained
// in the $contents
function linkKeywords(){
//iterate into each keyword
foreach($this->keywords as $word){
//strip white spaces.
$word = trim($word);
//initialized $replacedKeyword
$replacedKeyword = "";
//check if keyword is found in the
//predertemined list of links
if(in_array($word, $this->links_keys)){
//if found check if the word is found in the article
if (stristr($this->content,$word)){
//convert the $keyword into a link
//which include CSS formatting $id & $class.
$replacedKeyword = '<a href="'.$this->links[$word].'" id="'.$this->id.'" class="'.$this->class.'">'.$word.'</a>';
//find whole word only
//this prevents replcement of words contained
//in compound words.
$whole_word = "/\\b(" . trim($word) . ")\\b/".$this->type;
//replace the article contents of with the keywords
//with links.
$this->content = preg_replace($whole_word, $replacedKeyword, $this->content);
}
}
}
return $this->content;
}
}
And that's it. The first script ties everything together, the second script pulls the content and identifies the keywords, and the third script creates the links and returns the content auto linked. My guess would be to place some sort of filter in the class.autolink file but beyond that I'm lost. Any ideas?
I already posted this in the script request forum, but I'm pretty sure it actually belongs here... Again, any suggestions, even a hint in the right direction, would be greatly appreciated.