代码之家  ›  专栏  ›  技术社区  ›  axelra82

php regex span font-weight 700 to strong标记

  •  -1
  • axelra82  · 技术社区  · 7 年前

    我不能把这个瑞格鞋弄好,我也看不到我缺了什么。 See Regex101 example 或分解如下:

    正则表达式

    <span.*?font-weight:700.*?>(.*?)<\/span>
    

    我试图找到每个包含字体粗细:700的跨度实例。

    <p><span style="color:#2c2c2c;font-weight:700;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">Strong content</span></p><ul><li><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">list item</span></li><li><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">list item</span></li><li><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">list item</span></li><li><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">list item</span></li><li><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">list item</span></li><li><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">list item</span></li></ul><p><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">Content text</span></p><p><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">Content text</span></p><p><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">Content text</span></p><p><span style="font-size:10.5pt;color:#2c2c2c;font-weight:700">Should be bold</span><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">: regular text</span></p><p><span style="font-size:10.5pt;color:#2c2c2c;font-weight:700">Should be bold</span><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">: regular text </span></p><p><span style="font-size:10.5pt;color:#2c2c2c;font-weight:700">Should be bold</span><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">: regular text</span></p>
    

    获取该范围的内容并将其替换为

    <strong>$1</strong>
    

    问题是这是我的结果:

    <p><strong>Strong content</strong></p><ul><li><strong>Should be bold</strong><strong>Should be bold</strong><strong>Should be bold</strong><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">: regular text</span></p>
    

    它将删除所有列表项,并在匹配2和3之后删除“常规文本”。

    预期产量为:

    <p><strong>Strong content</strong></p><ul><li><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">list item</span></li><li><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">list item</span></li><li><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">list item</span></li><li><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">list item</span></li><li><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">list item</span></li><li><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">list item</span></li></ul><p><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">Content text</span></p><p><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">Content text</span></p><p><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">Content text</span></p><p><strong>Should be bold</strong><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">: regular text</span></p><p><strong>Should be bold</strong><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">: regular text </span></p><p><strong>Should be bold</strong><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">: regular text</span></p>
    
    2 回复  |  直到 7 年前
        1
  •  0
  •   user3783243    7 年前

    只需交换元素就可以通过这个线程实现, Replace Tag in HTML with DOMDocument . 这里有一个扩展方法,它只影响具有该样式属性的元素。

    $html = '<p><span style="color:#2c2c2c;font-weight:700;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">Strong content</span></p><ul><li><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">list item</span></li><li><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">list item</span></li><li><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">list item</span></li><li><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">list item</span></li><li><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">list item</span></li><li><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">list item</span></li></ul><p><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">Content text</span></p><p><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">Content text</span></p><p><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">Content text</span></p><p><span style="font-size:10.5pt;color:#2c2c2c;font-weight:700">Should be bold</span><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">: regular text</span></p><p><span style="font-size:10.5pt;color:#2c2c2c;font-weight:700">Should be bold</span><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">: regular text </span></p><p><span style="font-size:10.5pt;color:#2c2c2c;font-weight:700">Should be bold</span><span style="color:#2c2c2c;font-weight:400;text-decoration:none;vertical-align:baseline;font-size:10.5pt;font-family:&quot;Arial&quot;;font-style:normal">: regular text</span></p>';
    $dom = new domdocument();
    $dom->loadhtml($html);
    $elements = $dom->getElementsByTagName("span");
    for ($i = $elements->length - 1; $i >= 0; $i --) {
        if(preg_match('/font-weight:700/', $elements[$i]->getattribute('style'))) {
            $nodePre = $elements->item($i);
            $nodeDiv = $dom->createElement("strong", $nodePre->nodeValue);
            $nodePre->parentNode->replaceChild($nodeDiv, $nodePre);
        }
    }
    echo $dom->savehtml();
    

    https://3v4l.org/Y7Rua

    替代方案:

    if(preg_match('/font-weight:700/', $elements[$i]->getattribute('style'))) {
    

    strpos 也可以使用,我猜你可能有空格,所以我使用了regex版本。

    if(strpos($elements[$i]->getattribute('style'), 'font-weight:700') !== FALSE) {
    

    https://3v4l.org/uqWpj

    为了回答为什么你的regex切割比你想要的多,这是因为 <span.* 比赛 <span style="color:#2c2c2c;font-weight:400; 一直往前走直到它找到 font-weight:700 . 然后它捕获元素之后的内容,所有中间数据都将丢失。这就是为什么不应该使用regex进行解析的原因,它不知道元素。

        2
  •  0
  •   sln    7 年前

    你的regex不起作用的原因是 跨度 标记不包含该字体粗细。
    这会导致regex部分 .*? 继续匹配,直到找到带有
    那个字体的粗细。

    此regex将把匹配限制为包含该字体粗细的有效标记。

    查找:

    /<span(?=\s)(?=(?:[^>"']|"[^"]*"|'[^']*')*?\sstyle\s*=\s*(?:(['"])(?:(?!\1)[\S\s])*?font-weight:700(?:(?!\1)[\S\s])*\1))\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>([\S\s]*?)<\/span\s*>/
    

    替换: <strong>$2</strong>

    https://regex101.com/r/o9qcHz/1

    更多regex信息:

     # Begin open Span tag
    
     < span
     (?= \s )
     (?=                    # Asserttion (a pseudo atomic group)
          (?: [^>"'] | " [^"]* " | ' [^']* ' )*?
          \s style \s* = \s* 
          (?:
               ( ['"] )               # (1), Quote
               (?:
                    (?! \1 )
                    [\S\s] 
               )*?
               font-weight:700        # font weight 700
               (?:
                    (?! \1 )
                    [\S\s] 
               )*
               \1 
          )
     )
                            # Have the correct font-weighT, just match the rest of tag
     \s+ 
     (?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
    
     >                      # End span tag
    
     ( [\S\s]*? )           # (2), span content
     </span \s* >           # Close span tag