代码之家 › 专栏 › 技术社区 › lprsd

字符串按要求的格式分隔,是否为蟒蛇式?(有或无regex)

python format string regex

lprsd · 技术社区 · 16 年前

我有一个字符串的格式:

t='@abc @def Hello this part is text'

我想要这个:

l=["abc", "def"] 
s='Hello this part is text'

我这样做了:

a=t[t.find(' ',t.rfind('@')):].strip()
s=t[:t.find(' ',t.rfind('@'))].strip()
b=a.split('@')
l=[i.strip() for i in b][1:]

它在大多数情况下都有效,但在文本部分具有“@”时失败。 Eg:什么时候:

t='@abc @def My email is red@hjk.com'

它失败了。@名称位于开头,@名称后可以有文本,其中可能包含@。

很明显,我可以在首字母后加一个空格,找出第一个不带“@”的单词。但这似乎不是一个优雅的解决方案。

解决这个问题的方法是什么?

7 回复 | 直到 16 年前

Community CDub 8 年前

毫不客气地在托普夫的努力下建立:

import re
rx = re.compile("((?:@\w+ +)+)(.*)")
t='@abc   @def  @xyz Hello this part is text and my email is foo@ba.r'
a,s = rx.match(t).groups()
l = re.split('[@ ]+',a)[1:-1]
print l
print s

印刷品:

['abc'、'def'、'xyz']
你好,这部分是文字,我的电子邮件是foo@ba.r。

公正地要求说明 hasen j ,让我澄清一下这是如何工作的:

/@\w+ +/

匹配单个标记-@后接至少一个字母数字或u后接至少一个空格字符。+是贪婪的,所以如果有多个空间,它会把它们都抓住。

为了匹配这些标记中的任何数量,我们需要在标记的模式中添加一个加号(一个或多个事物);因此我们需要用括号将其分组:

/(@\w+ +)+/

它匹配一个或多个标签,贪婪地匹配所有标签。但是,这些圆括号现在与我们的捕获组混淆了,因此我们通过使它们成为一个匿名组来撤消这一操作:

/(?:@\w+ +)+/

最后,我们将其组成一个捕获组,并添加另一个来清除其余部分:

/((?:@\w+ +)+)(.*)/

总结最后一个细目:

((?:@\w+ +)+)(.*)
 (?:@\w+ +)+
 (  @\w+ +)
    @\w+ +

请注意,在回顾这篇文章时,我已经改进了它-\w不需要在一个集合中,现在它允许标记之间有多个空格。谢谢,哈森-J!

Ricardo Reyes 16 年前

t='@abc @def Hello this part is text'

words = t.split(' ')

names = []
while words:
    w = words.pop(0)
    if w.startswith('@'):
        names.append(w[1:])
    else:
        break

text = ' '.join(words)

print names
print text

Osama Al-Maadeed 16 年前

这个怎么样?

按空间拆分。
每个单词前,检查

2.1。如果单词以@开头,则按到第一个列表

2.2。否则,只需用空格连接其余单词。

MrTopf 16 年前

也可以使用正则表达式:

import re
rx = re.compile("@([\w]+) @([\w]+) (.*)")
t='@abc @def Hello this part is text and my email is foo@ba.r'
a,b,s = rx.match(t).groups()

但这完全取决于数据的外观。所以你可能需要调整它。它所做的基本上是通过()创建组并检查其中允许的内容。

SilentGhost 16 年前

 [i.strip('@') for i in t.split(' ', 2)[:2]]     # for a fixed number of @def
 a = [i.strip('@') for i in t.split(' ') if i.startswith('@')]
 s = ' '.join(i for i in t.split(' ') if not i.startwith('@'))

Jason Coon 16 年前

[ 编辑 :这是执行上述OSAMA的建议]

这将基于字符串开头的@variables创建l,然后一旦找到非@var,只需获取字符串的其余部分。

t = '@one @two @three some text   afterward with @ symbols@ meow@meow'

words = t.split(' ')         # split into list of words based on spaces
L = []
s = ''
for i in range(len(words)):  # go through each word
    word = words[i]
    if word[0] == '@':       # grab @'s from beginning of string
        L.append(word[1:])
        continue
    s = ' '.join(words[i:])  # put spaces back in
    break                    # you can ignore the rest of the words

您可以将其重构为更少的代码,但我正在努力使所发生的事情变得明显。

Martin Vilcans 16 年前

下面是另一个使用split()而不使用regexpes的变体:

t='@abc @def My email is red@hjk.com'
tags = []
words = iter(t.split())

# iterate over words until first non-tag word
for w in words:
  if not w.startswith("@"):
    # join this word and all the following
    s = w + " " + (" ".join(words))
    break
  tags.append(w[1:])
else:
  s = "" # handle string with only tags

print tags, s

下面是一个较短但可能有点神秘的版本,它使用regexp查找第一个空格,后跟一个非@字符:

import re
t = '@abc @def My email is red@hjk.com @extra bye'
m = re.search(r"\s([^@].*)$", t)
tags = [tag[1:] for tag in t[:m.start()].split()]
s = m.group(1)
print tags, s # ['abc', 'def'] My email is red@hjk.com @extra bye

如果没有标签或文本,则无法正常工作。格式未指定。您需要提供更多的测试用例来验证。