代码之家 › 专栏 › 技术社区 › adi

正则表达式用于匹配不以某些数字结尾的整数

regex python

adi · 技术社区 · 7 月前

我无法构建一个模式来返回一个不以数字序列结尾的整数。这些数字可以是任何长度,甚至是个位数,但它们始终是整数。此外,同一行文本上可能有多个数字,我想将它们全部匹配。数字后面总是跟着一个空格,或者是行的末尾,或者是文本的末尾。我在python 3.12中匹配

例如,在文本之上 '12345 67890 123175 9876' ,假设我想得到所有不以结尾的数字 175 .

我想要以下比赛:

12345
67890
9876

我尝试使用以下方法:

\d+(?<!175)(\b|$) ,匹配4个空字符串,

text = "12345 67890 123175 9876"
matches = findall(r"\d+(?!175)(\b|$)", text)
print(matches)
> ['', '', '', '']

\d+(?<!175) ,匹配所有4个数字

matches = findall(r"\d+(?<!175)", text)
> ['12345', '67890', '12317', '9876']

\d+(?:175) ,仅匹配以结尾的数字 175

matches = findall(r"\d+(?:175)", text)
> ['123175']

4 回复 | 直到 7 月前

Abhijit Sarkar 7 月前

您可以使用 negative lookbehind .*(?<!a) 确保字符串不以结尾 a .

\d++(?<!175)

测试 here .

请注意,占有量化( ++ )已经在Python 3.11中引入。你的第二种方法很接近,但由于贪婪量词而不正确( + )会吃掉所有的数字,然后试图回溯。

SIGHUP 7 月前

演示简单的拆分和比较如何比使用正则表达式更快:

import re
from timeit import timeit
import sys
import random

s = " ".join([str(random.randint(1_000, 100_000)) for _ in range(10_000)])

PATTERN_2 = re.compile(r"\d+")

def func2():
    """Use simple re pattern with list comprehension"""
    return [v for v in PATTERN_2.findall(s) if not v.endswith("175")]

def func3():
    """Simple list comprehension (no re)"""
    return [v for v in s.split() if not v.endswith("175")]

funclist = [func2, func3]

if sys.version_info.major >= 3 and sys.version_info.minor >= 11:
    PATTERN_1 = re.compile(r"\d++(?<!175)")

    def func1():
        """Use re with possesive quantifier"""
        return PATTERN_1.findall(s)
    
    funclist.append(func1)

for func in funclist:
    duration = timeit(func, number=5_000)
    print(func.__doc__, f"{duration=:.4f}s")

for f1, f2 in zip(funclist, funclist[1:]):
    assert f1() == f2()

输出:

Use simple re pattern with list comprehension duration=5.1802s
Simple list comprehension (no re) duration=2.0577s
Use re with possesive quantifier duration=4.1744s

平台:

Python 3.13.0
MacOS 15.1.1
Apple M2

注:

这里隐含的假设是源字符串仅由空格分隔的整数组成

moken 7 月前

有必要使用正则表达式吗?

content ='12345 67890 123175 9876'
for item in content.split(' '):
    if not item.endswith('175'):
        print(item)

输出

12345
67890
9876

推荐文章

Google User · Django管理员在`list_display中未显示`creation_date`字段`

5 月前

user29747013 · 如何创建一个新的数据框架,其中包含原始数据框架中列的聚合列?

5 月前

ÎÎÎ½Î· ÎÎ®Î¹Î½Î¿Ï · Python lxml.html语法错误:使用lxml find时XPATH的谓词无效

5 月前

user29715306 · from_users=和chats=电视节目中的差异

5 月前

Redshoe · 当执行numpy.genfromtxt()时,python是否会读取文件的所有行?

5 月前

RASEL MAHMUD · 为什么以及如何在is_even()函数内的IF条件中递归X变量在满足0后递增?[副本]

5 月前

prayner · 更新嵌套字典包含列表中的项

5 月前

Bringo Jr · 我可以在O(n)中解决这个问题吗?

5 月前

Dave · 如何在for循环中修改列表值

5 月前

Shukurullox Komiljonov · 从记录中获得相互和解。使用SQL

5 月前