代码之家  ›  专栏  ›  技术社区  ›  sean

搜索文本或其任何子体中包含字符串的元素

  •  0
  • sean  · 技术社区  · 5 年前

    标题说明了一切。我想搜索所有元素,比如Node1,它的“你好,世界!”出现在其文本中,或其任何死者中。如果字符串出现在子体中,我仍然希望得到Node1,而不是该子体。

    <?xml version="1.0"?>
    <data>
        <Node1 id="Node1">Hello World! from Node1
        </Node1>
    
        <Node1 id="Node2">Nothing to see here
        </Node1>
    
        <Node1 id="Node3">
            Some text goes here
            <Node2>
                More text
                <Node3>Hellow World! from Node3 </Node3>
            </Node2>
        </Node1>
    </data>
    
    0 回复  |  直到 5 年前
        1
  •  0
  •   Physicing    5 年前

    具有 ElementTree 我想你可以这样做

    import sys
    import xml.etree.ElementTree as etree
    
    s = """<root>
    <element>A</element>
      <element2>C</element2>
        <element3>TEST</element3>
    <element>B</element>
      <element2>D</element2>
        <element3>Test</element3>
    </root>"""
    
    e = etree.fromstring(s)
    
    found = [element for element in e.iter() if element.text == 'Test']
    
    print(found[0])
    

    返回:

    <Element 'element3' at 0x7f9edb7e7a98>
    

    参考:

        2
  •  0
  •   balderman    5 年前

    见下文

    import xml.etree.ElementTree as ET
    
    xml = '''<?xml version="1.0"?>
    <data>
        <Node1 id="Node1">Hello World! from Node1
        </Node1>
    
        <Node1 id="Node2">Nothing to see here
        </Node1>
    
        <Node1 id="Node3">
            Some text goes here
            <Node2>
                More text
                <Node3>Hello World! from Node3 </Node3>
            </Node2>
        </Node1>
    </data>'''
    
    
    def scan_node(node, txt, result):
        """
        Scan the node (recursively) and look for the text 'txt'
        :param node:
        :param txt:
        :return:
        """
        children = list(node)
        for child in children:
            if txt in child.text:
                result.append(child)
            scan_node(child, txt, result)
    
    
    root = ET.fromstring(xml)
    result = []
    scan_node(root, 'Hello World', result)
    print(result)
    

    输出

    [<Element 'Node1' at 0x00723A80>, <Element 'Node3' at 0x00723C30>]