Quote:
|
Originally Posted by zendobi
Ok like I mentioned before I have a couple content sites that anyone can submit articles and tutorials to. However I have been running into the problem of some people taking other peoples articles and changing a few words here and there and posting it as their own. SO I am looking for anyones suggestions, on building a content filter. I have considered doing a line by line compare of all the other articles in the database that have a similar keyword density, but that could be majorly server intensive. Any ideas are welcome 
|
You should try implementing multiple levels. Don't just do line-by-line for everything. Instead, try initial trials to eliminate the need to go line-by-line. For example, you could check lengths against each other (and if the difference of two lengths falls under a certain value, only then compare).
Thus, you could do a lot of elimination before you even get to anything intensive. Try to think of some other possible eliminating factors and have them attempt to eliminate fraudulent articles. If they can't, only then should you do the deeper analysis.