代码之家 › 专栏 › 技术社区 › jun.yoon77

R中的正则表达式负向后看

negative-lookbehind regex r

jun.yoon77 · 技术社区 · 9 年前

我试图在stringr中做一个正则表达式,以便在R中进行负向后看。

基本上,我有一个文本数据,看起来像这样:

See item 7 Management's Discussion and Analysis. BlahBlahBlah. Item 7 Management's Discussion and Analysis. BlahBlahBlah. Item 8 Financial Statements and Supplementary Data.

我想从“blahblahblah”句后面的“第7项”到“第8项-财务报表和补充数据”中选择所有内容

所以我想要

Item 7 Management's Discussion and Analysis. BlahBlahBlah. Item 8 Financial Statements and Supplementary Data.

除了包含“见第7项管理层的讨论和分析”的句子之外,其他都是

现在,我正在使用以下代码:

(?<!see)Item 7(.*?)Item 8

但它没有返回我想要的。

我的逻辑是,不要看那些包含“见”一词的句子,后面是“第7项管理层的讨论和分析”,但似乎不起作用。

https://regex101.com/r/yF7aQ1/3

有没有一种方法可以实现这种消极的回顾?

1 回复 | 直到 9 年前

akuiper 9 年前

不确定如何在R中实现它, .*(?<!See) (item 7 .*) 工作 sub ,只需注意see后面的空格和可以忽略的字母大小写 ignore.case 参数

sub(".*(?<!See) (item 7 .*)", "\\1", s, ignore.case = T, perl = T)

# [1] "Item 7 Management's Discussion and Analysis. BlahBlahBlah. Item 8 Financial Statements and Supplementary Data."

另一种选择:

sub(".*(?=(?<!See) ?item 7)", "", s, ignore.case = T, perl = T)
# [1] "Item 7 Management's Discussion and Analysis. BlahBlahBlah. Item 8 Financial Statements and Supplementary Data."

具有 str_extract_all() 从…起 stringr 包,它似乎没有提供 忽略.case 选项,您可以使用 [Ii] 要忽略此情况:

library(stringr)
str_extract_all(s, "(?<!See )[Ii]tem 7(.*)")
# [1] "Item 7 Management's Discussion and Analysis. BlahBlahBlah. Item 8 Financial Statements and Supplementary Data."

推荐文章

DotFX · RegEx捕获关键字前但括号后的所有内容

1 年前

user66001 · 正则表达式用于匹配有引号和无引号的文本,并且不匹配任何部分

1 年前

perlchamp · 为什么这也匹配?

1 年前

con · Negative Lookaward在perl正则表达式中不起作用

1 年前

Andrus · 如何在sql中查找第二个匹配项

1 年前

iato · 确保正则表达式不从命名材料中的数字中提取

1 年前

vr8ce · 非成对标记中特定字符的正则表达式

1 年前

MARTIN · 交换第一个和最后一个单词,反转所有中间的字符

1 年前

Carsten · 使用最近的搜索模式更改文本块

1 年前

Eric Marceau · Grep:有没有一种特殊的方法可以将“无字符”作为“字符位置”匹配的置换?

1 年前