I think the easiest way to do this is to build a spider to read the files and write them.
I use the following function for grabbing images in protected directories but it can be applied to anything and grab html as well
PHP Code:
function http_get($url,$path,$file,$full_path = ""){
$fp = fsockopen($url, 80, $errno, $errstr, 30);
if (!$fp) {
return false; //"$errstr ($errno)<br />\n";
}
else {
if(!file_exists($full_path))
mkdir($full_path,0777); // make new directory for downloads
if(!file_exists($full_path . '/' . $file)){ //no need to download if already available
$out = "GET " .$path.$file ." HTTP/1.1\r\n";
$out .= "Host:".$url."\r\n";
$out .= "Connection: Close\r\n\r\n";
$headers_ended = false;
fwrite($fp, $out); //send request (GET)
// Read Request (POST)
while (!feof($fp)) {
$socket_read = fgets($fp, 128);
if (preg_match('/^\\r\\n$/',$socket_read) == 1){ //Blank Line After Headers
$headers_ended = true;
$socket_read = NULL;
}
if($headers_ended && !is_null($socket_read))
$image_content .= $socket_read;
}
fclose($fp);
$fp = fopen($full_path . '/' . $file,'xb'); //write file
fwrite($fp,$image_content);
fclose($fp);
chmod($full_path . '/' . $file,0777);
}
}
return true;
}
Then just feed http_get all the files you want to convert and it will do the conversion. full_path is the directory you want to stor the info in btw.