Current location: Hot Scripts Forums » Programming Languages » PHP » Preg_replace whole word only


Preg_replace whole word only

Reply
  #1 (permalink)  
Old 03-12-10, 01:48 PM
cesarcesar cesarcesar is offline
Newbie Coder
 
Join Date: Mar 2008
Posts: 77
Thanks: 0
Thanked 1 Time in 1 Post
Preg_replace whole word only

Im trying to make a naughty word filter. It removes bad words fine, but instances where there is a bad word found in the text like "assist" and "asses" get caught in the filter as well. Strangely though if the sentence is: My asses to assist me." the clean version will read: My asses to ***ist me." It seems to clear the first use of the word in another word, but then blocks the rest. Any ideas? My script is below. Thanks.

Code:
function cleanWords($value) {

	/*   strip naughty words   */
	$bad_word_file = 'standards/badwords.txt';
	$strtofile = fopen($bad_word_file, "r");
	$badwords = explode("\n", fread($strtofile, filesize($bad_word_file)));
	fclose($strtofile);
	
	for ($i = 0; $i < count($badwords); $i++) {
		$wordlist .= str_replace(chr(13),'',$badwords[$i]).'|';
	}
	$wordlist = substr($wordlist,0,-1);

	$value = preg_replace("/\b($wordlist)\b/ie", 'preg_replace("/./","*","\\1")', $value);	
	return $value;

}
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiShare on FacebookShare on Stumble UponShare on Twitter
Reply With Quote
  #2 (permalink)  
Old 03-12-10, 07:44 PM
wirehopper's Avatar
wirehopper wirehopper is offline
-
 
Join Date: Feb 2006
Posts: 2,516
Thanks: 20
Thanked 109 Times in 106 Posts
PHP Code:

<?php

/* Read in from the file here, not in the function - you only need to read the file once */
$wordlist='one|two|three|four';

/* Sample data */
$words=array('stone twoot three fourth','under oven tree','cookie oneder ream');

foreach (
$words as $k => $v)
        echo 
'Words: '.$v.' Cleaned: '.clean($wordlist,$v).PHP_EOL;

function 
clean($wordlist,$value)
{
        return 
preg_replace("/($wordlist)/i"'***',trim($value));
}
?>
Output:

Quote:
Words: stone twoot three fourth Cleaned: st*** ***ot *** ***th
Words: under oven tree Cleaned: under oven tree
Words: cookie oneder ream Cleaned: cookie ***der ream
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiShare on FacebookShare on Stumble UponShare on Twitter
Reply With Quote
  #3 (permalink)  
Old 03-13-10, 10:25 AM
Jcbones Jcbones is offline
Aspiring Coder
 
Join Date: Mar 2009
Location: North Carolina, USA
Posts: 516
Thanks: 5
Thanked 47 Times in 44 Posts
With a lot of bad words, this function can get consuming, but it does filter whole words only, and will not split words up.

Using wirehoppers example, with a litte changing around.

PHP Code:

<?php
/* Read in from the file here, not in the function - you only need to read the file once */
$wordlist[] = 'one';
$wordlist[] = 'two';
$wordlist[] = 'three';
$wordlist[] = 'four';

/* Sample data */
$words 'stone twoot three fourth under oven tree cookie oneder ream one sixone threetwo fournine four';

foreach (
$wordlist as $v)
      
$words clean($v,$words);

function 
clean($wordlist,$value)
{
        return 
preg_replace("/\b$wordlist\b/i"'***',trim($value));
}  

echo 
'Words: '.$words.PHP_EOL;
?>
Code:
Words: stone twoot *** fourth under oven tree cookie oneder ream *** sixone threetwo fournine ***
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiShare on FacebookShare on Stumble UponShare on Twitter
Reply With Quote
  #4 (permalink)  
Old 03-13-10, 05:53 PM
cesarcesar cesarcesar is offline
Newbie Coder
 
Join Date: Mar 2008
Posts: 77
Thanks: 0
Thanked 1 Time in 1 Post
my list is 1000 words. ill try your scripts out. thanks.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiShare on FacebookShare on Stumble UponShare on Twitter
Reply With Quote
  #5 (permalink)  
Old 03-13-10, 06:39 PM
cesarcesar cesarcesar is offline
Newbie Coder
 
Join Date: Mar 2008
Posts: 77
Thanks: 0
Thanked 1 Time in 1 Post
i ended up finding that the word "a.s.s." was in my list. I think the dots were messing up the expression. For thos interested, this is my new code. Thanks for any suggestions to get it where it is.

Code:
$_SESSION[wordlist] = join("|", array_map('trim', file('standards/badwords.txt')));

function cleanWords($value) {

	global $_SESSION;

	$value = preg_replace("/\b($_SESSION[wordlist])\b/ie", 'str_repeat("*", strlen("\\1")) ', $value);	
	return $value;

}
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiShare on FacebookShare on Stumble UponShare on Twitter
Reply With Quote
The Following User Says Thank You to cesarcesar For This Useful Post:
wirehopper (03-13-10)
Reply

Bookmarks


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Autolink Script - Link once/keyword cocaine_energy_drink PHP 1 09-22-08 08:19 PM
Autolink Script Tweaks cocaine_energy_drink Script Requests 0 09-21-08 10:07 PM
search function using dot net nishudude_13 Windows .NET Programming 1 08-28-08 12:28 PM
PHP variables to microsoft word Deansatch PHP 7 12-17-07 03:21 PM
Find the beginning letter of a word mcrob PHP 6 05-23-05 11:06 AM


All times are GMT -5. The time now is 06:53 AM.
vBulletin® Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.