Simple method for indexing MS Word documents

Building indexers/spiders that can read binary MS Word (.doc) documents can be difficult, expecially on *nix servers, which don't support PHP's COM abilities. Solutions usually involve installing binaries on the server (often impossible or disallowed). This simple PHP snippet makes a pretty good job of extracting text from an MS Word document for use in a search index. While not pretending to be perfect, it has proved itself useful on thousands of test documents.

 
Visit Site:
Simple method for indexing MS Word documents
Filed in:
PHP / Tutorials & Tips / Searching
Platforms:
LinuxMicrosoft WindowsUnixSun Solaris 
Databases:
No Database 
Date Added:
Apr 12, 2006 
Last Updated:
Apr 30, 2006 

License and Pricing Information

Freeware

Price: $0.00 USD

Publisher site visits: 2,033
Average rating: 4.22
Total ratings: 9

Be the first to review this listing!

Share this Listing