ok, this is purely theoretic, but here's what I just thought of:
you read the content of an html file into a variable, then use strip_tags() to strip all html from that file. what is left is all the "language" you are using. then you could take that, and use regular expressions to change all the entries of that "language" into variables.
but, as I said, this is purely theoretic, I've never tried this.