代码之家  ›  专栏  ›  技术社区  ›  Jegor

Python多行正则表达式忽略字符串中的n行

  •  1
  • Jegor  · 技术社区  · 7 年前

    我写正确的正则表达式有问题。也许有人能帮我?

    我有两个网络设备的输出:

    1.

    VRF NAME1 (VRF Id = 2); default RD 9200:1; default VPNID <not set>
    Old CLI format, supports IPv4 only
    Flags: 0xC
    Interfaces:
    Gi1/1/1                 Gi1/1/4
    

    VRF NAME2 (VRF Id = 2); default RD 101:2; default VPNID <not set>
    Interfaces:
    Gi0/0/3                  Gi0/0/4                  Gi0/1/4
    

    我需要从两者中提取接口名称。

    我有正则表达式:

     rx = re.compile("""
                  VRF\s(.+?)\s\(.*RD\s(.*);.*[\n\r]
                  ^.*$[\n\r]
                  ^.*$[\n\r]
                  ^.*$[\n\r]
                  (^.*)
                  """,re.MULTILINE|re.VERBOSE)
    

    但它只适用于第一个文本,它跳过了4行,5行正是我需要的。然而,有许多路由器返回输出,如2。

    3 回复  |  直到 7 年前
        1
  •  1
  •   Marc Lambrichs user8588010    7 年前

    编辑:在为我们提供更多输入后,答案被更正。

    有很多方法可以解决这个问题。看看 regex101 . 正则表达式

    (?s)VRF\s([^\s]+)\s.*?(?:RD\s([\d.]+:\d|<not\sset>));.*?Interfaces:(?:\r*\n)\s*(.*?)(?:\r*\n)
    

    Interfaces .

    说明:

    (?s)                           # single line mode: make "." read anything,
                                   # including line breaks
    VRF                            # every records start with VRF
    \s                             # read " "
    ([^\s]+)                       # group 1: capture NAME VRF
    \s                             # read " "
    .*?                            # lazy read anything
    (?:                            # start non-capture group
     RD\s                          # read "RD "
    (                              # group 2
      [\d.]+:\d                    # number or ip, followed by ":" and a digit
      |                            # OR
      <not\sset>                   # value "<not set>"
    )                              # group 2 end
    )                              # non-caputure group end
    ;                              # read ";"
    .*?                            # lazy read anything
    Interfaces:                    # read "Interfaces:"
    (?:\r*\n)                      # read newline
    \s*                            # read spaces
    (.*?)                          # group 3: read line after "Interfaces:"
    (?:\r*\n)                      # read newline
    

    $ cat test.py
    import os
    import re
    
    pattern = r"(?s)VRF\s([^\s]+)\s.*?(?:RD\s([\d.]+:\d|<not\sset>));.*?Interfaces:(?:\r*\n)\s*(.*?)(?:\r*\n)"
    
    text = '''\
    VRF BLA1 (VRF Id = 2); default RD 9200:1; default VPNID <not set>
    Old CLI format, supports IPv4 only
    Flags: 0xC
    Interfaces:
      Gi1/1/1.451              Gi1/1/4.2019
    Address family ipv4 unicast (Table ID = 0x2):
      VRF label allocation mode: per-prefix
    Address family ipv6 unicast not active
    Address family ipv4 multicast not active
    
    VRF BLA2 (VRF Id = 1); default RD <not set>; default VPNID <not set>
    New CLI format, supports multiple address-families
    Flags: 0x1808
    Interfaces:
      Gi0
    Address family ipv4 unicast (Table ID = 0x1):
      Flags: 0x0
    Address family ipv6 unicast (Table ID = 0x1E000001):
      Flags: 0x0
    Address family ipv4 multicast not active\
    '''
    
    for rec in text.split( os.linesep + os.linesep):
        m = re.match(pattern, rec)
        if m:
            print("%s\tRD: %s\tInterfaces: %s" % (m.group(1), m.group(2), m.group(3)))
    

    这导致:

    $ python test.py
    BLA1    RD: 9200:1  Interfaces: Gi1/1/1.451              Gi1/1/4.2019
    BLA2    RD: <not set>   Interfaces: Gi0
    
        2
  •  1
  •   Community CDub    5 年前

    正面回顾

    (?<=…)

    https://regex101.com/

    正则表达式 (?<=Interfaces:\n).+ 匹配每行“接口:”

    我在regex101上测试了它。com,它与您的两个示例都完美地结合在一起。

        3
  •  0
  •   koalo    7 年前

    有多个选项,但最接近初始尝试的选项使用可选的未捕获行:

    rx = re.compile("""
    VRF\s(.+?)\s\(.*RD\s(.*);.*[\n\r]
    (?:^.*$[\n\r])?
    (?:^.*$[\n\r])?
    Interfaces:[\n\r]
    (.*)""",re.MULTILINE|re.VERBOSE)
    

    (?:^.*$[\n\r])? 在应用程序中工作。