请注意
:这是
不
试图为正则表达式提供修复程序。这里只是想说明创建一个能够成功解析HTML的regex有多困难(我敢说是不可能的)。即使结构良好的XHTML也会非常困难,但结构不良的HTML是正则表达式的禁区。
我100%同意使用正则表达式来尝试HTML解析是一个非常糟糕的主意。下面的代码使用提供的函数来解析一些简单的HTML标记。当它找到嵌套的HTML标记时,第二次尝试就失败了
<em>Test<em>
:
$t['label'] = 'Test';
$text = '<p>Test</p>';
$find = '/(?![^<]+>)(?<!\w)(' . preg_quote($t['label']) . ')\b/s';
$text = preg_replace_callback($find, 'replaceCallback', $text);
echo "Find: $find\n";
echo 'Quote: ' . preg_quote($t['label']) . "\n";
echo "Result: $text\n";
/* Returns:
Find: /(?![^<]+>)(?<!\w)(Test)\b/s
Quote: Test
Result: <p><a class="tag" rel="tag-definition" title="Click to know more about Test" href="?tag=Test">Test</a></p>
*/
$t['label'] = '<em>Test</em>';
$text = '<p>Test</p>';
$find = '/(?![^<]+>)(?<!\w)(' . preg_quote($t['label']) . ')\b/s';
$text = preg_replace_callback($find, 'replaceCallback', $text);
echo "Find: $find\n";
echo 'Quote: ' . preg_quote($t['label']) . "\n";
echo "Result: $text\n";
/* Returns:
Find: /(?![^<]+>)(?<!\w)(Test)\b/s
Quote: Test
Result: <p><a class="tag" rel="tag-definition" title="Click to know more about Test" href="?tag=Test">Test</a></p>
Warning: preg_replace_callback() [function.preg-replace-callback]: Unknown modifier '\' in /test.php on line 25
Find: /(?![^<]+>)(?<!\w)(\<em\>Test\</em\>)\b/s
Quote: \<em\>Test\</em\>
Result:
*/
function replaceCallback($match) {
if (is_array($match)) {
$htmlVersion = $match[1];
$urlVersion = urlencode($htmlVersion);
return '<a class="tag" rel="tag-definition" title="Click to know more about ' . $htmlVersion . '" href="?tag=' . $urlVersion . '">' . $htmlVersion . '</a>';
}
return $match;
}