代码之家 › 专栏 › 技术社区 › stack

当标签之间没有任何内容时,我怎么能什么都不匹配?

regex php

-3

stack · 技术社区 · 8 年前

<a href="(.*?)"><li>[\s\S]+?<img src="([^"]+)[\s\S]+?<p>([^<]+)[\s\S]+?<s([^>]+)([^<]+)<\/span>

> 属于  < 标签我该怎么做? 

pan> 灾难性回溯 错误我的意思是:

<a href="(.*?)"><li>[\s\S]+?<img src="([^"]+)[\s\S]+?<p>([^<]+)[\s\S]+?<span>([^>]+)([^<]+)<\/span>
/* ---------------------------- added -----------------------------------^^^^

2 回复 | 直到 8 年前

Tom Lord 8 年前

首先,这就是你应该提出了以下问题:

在以下示例HTML数据中:
<a href="profile/xalil">
 <li>
 <img src="../users/avatar/small/thumb_default.jpg" />
 xalil eshghi
 
 </li>
</a>
href img src 这个 p 内容和 span

我尝试使用以下regexp。。。。。。。。。

the wrong approach 你能够

<s([^>]+)([^<]+)<\/span>

使用:

<span>([^<]*)<\/span>

……然而,这不仅仍然很难理解,而且还不能解释所有可能的边缘情况。

img 标记在 src ? 由于使用了 [\s\S]+ 在模式中-这可能会导致正则表达式跳转到HTML的一个完全不同的部分!

这个问题可以而且应该通过使用DOM解析器轻松解决。

<?php
// This is just some boilerplate code for the sake of completion...
$doc = new DOMDocument();
$doc->loadHTMLFile("your_page.html");
$xpath = new DOMXpath($doc);

// Do you want to scope your results to within <ul class="users"> ?
// If not, just use: $links = $xpath->query("//a");
$links = $xpath->query("//ul[@class='users']/a");

// Guard clause
if (is_null($links)) { return; }

$result = array();
foreach ($links as $link) {
  $href = $link->getAttribute('href');      // PART 1 - Get the href
  $img = $xpath->query("li/img", $link)[0];
  $img_src = $img->getAttribute('src');     // PART 2 - Get the img src
  $p = $xpath->query("li/p", $link)[0];
  $p_text = $p->textContent;                // PART 3 - Get the p contents
  $span = $xpath->query("li/span", $link)[0];
  $span_text = $span->textContent;          // PART 4 - get the span contents
  $result[] = [$href, $img_src, $p_text, $span_text];
}
print_r($result);
?>

Amadan 8 年前

我想这应该会有帮助。(假设您想要分析的文件是您放在regex101中的文件,并且您想要的字段是您试图在regexp中提取的字段)

<?php
$doc = new DOMDocument();
$doc->loadHTMLFile("testfile.html");
$xpath = new DOMXpath($doc);
$links = $xpath->query("//ul[@class='users']/a");
$result = array();
if (!is_null($links)) {
  foreach ($links as $link) {
    $href = $link->getAttribute('href');
    $img = $xpath->query("li/img", $link)[0];
    $img_src = $img->getAttribute('src');
    $p = $xpath->query("li/p", $link)[0];
    $p_text = $p->textContent;
    $span = $xpath->query("li/span", $link)[0];
    $span_text = $span->textContent;
    $result[] = [$href, $img_src, $p_text, $span_text];
  }
}
print_r($result);

推荐文章