正则表达式
是用于解析XML/HTML的错误工具。您应该改用DOM解析器。
XPath表达式
是一种专门分析DOM结构的语言。
$html = <<<_EOS_
<script>alert("Hallo Welt 1");</script>
<div>Hallo Welt</div>
<script type ="text/javascript">alert("Hallo Welt 2");</script>
<div>Hallo Welt 2</div>
<script type ="text/javascript">
alert("Hallo Welt 2");
</script>
_EOS_;
$doc = new DOMDocument();
$doc->loadHTML("<!DOCTYPE html><html>$html</html>");
$xpath = new DOMXPath($doc);
$scripts = $xpath->query('//script/text()');
foreach ($scripts as $script)
var_dump($script->data);