代码之家 › 专栏 › 技术社区 › Jeepstone

有没有办法优化在页面上查找文本项(不是regex)

matching parsing optimization regex php

Jeepstone · 技术社区 · 15 年前

在看到几个线程对在html文档中查找匹配项的regexp方法进行了修改之后,我使用了简单的html dom php解析器( http://simplehtmldom.sourceforge.net/ )我想知道我的代码是否是最优的。感觉我循环的次数太多了。有没有办法优化下面的循环?

//Get the HTML and look at the text nodes
   $html = str_get_html($buffer);
   //First we match the <body> tag as we don't want to change the <head> items
   foreach($html->find('body') as $body) {
    //Then we get the text nodes, rather than any HTML
    foreach($body->find('text') as $text) {
     //Then we match each term
     foreach ($terms as $term) {
      //Match to the terms within the text nodes
      $text->outertext = str_replace($term, '<span class="highlight">'.$term.'</span>', $text->outertext);
     }       
    }
   }

例如,在开始循环之前确定是否有匹配项会有什么不同吗?

2 回复 | 直到 15 年前

Amber 15 年前

您不需要外部foreach循环;格式良好的文档中通常只有一个body标记。相反,只要使用 $body = $html->find('body',0); .

然而,由于只有一个迭代的循环在运行时基本上等同于完全不循环,所以它可能不会对性能产生太大的影响。所以实际上,即使在原始代码中也只有2个嵌套循环,而不是3个。

Marcelo Cantos 15 年前

因为无知而说话,是吗? find 使用任意的xpath表达式?如果是,可以将两个外环折叠为一个:

foreach($html->find('body/text') as $body) {
    ...
}

推荐文章

user2058738 · approximate matching using grep

7 年前

Lacri Mosa · 检索包含pandas中另一个数据帧中的单词的数据帧中的行

7 年前

npross · Python字符串匹配,错误:位置0无需重复

7 年前

Hameer Abbasi · 获取数组中匹配元素的索引,考虑重复

7 年前

KolacheMaster · C编程:如何确定两个数字之间的精确匹配

7 年前

GIZNAJ · 无法在运行/连接的Selenium电网节点上启动“匹配配置”

8 年前

Arut · R保留矩阵行(如果在其他矩阵中可用),省略NAs

8 年前

Ben · 聚类和匹配之间有什么区别?

8 年前

Sajna · 弹性搜索中的聚合

8 年前

Mazen · 从列表中的元素创建所有可能的对,然后对它们进行均匀排序

8 年前