代码之家 › 专栏 › 技术社区 › Joe

使用gsub只提取特定长度的大写字母[重复]

Joe · 技术社区 · 11 月前

我有一个字符串,我希望在其中提取国家代码,它将始终以大写字母和3个字符的形式出现。

mystring
"Bloggs, Joe GBR London (1)/Bloggs, Joe London (2)" 
"Bloggs, Joe London (1)/Bloggs, Joe  GBR London (2)"  
"Bloggs, Joe London (1)/Bloggs, Joe London (2)" 
"Bloggs, Joe GBR London (1)/Bloggs, Joe GBR London (2)" 
 "Bloggs, J-S GBR London (1)/Bloggs, J-S GBR London (2)"

我想得到的

mystring
GBR/
/GBR
/
GBR/GBR
GBR/GBR

Blanks are fine if there is no country, I can deal with them

我尝试了一些我在这里看到的东西,其中一个试图删除所有不是大写的字符,但我留下了其他字母,我不想让它们像名字和位置中的大写字母一样。然后,我试图做类似的事情,试图删除所有以大写字母开头和结尾的字母(也因为名字问题而不高兴);

gsub("[^A-Z$]", "", mystring)

如果我只在有3个字母的地方保留所有大写字母,但我不能完全正确地编码,我想如果有人知道甚至知道一个更稳健的方法,它会像下面这样;

gsub("[^A-Z$]{3}", "", mystring)

1 回复 | 直到 11 月前

Gregor Thomas 11 月前

我喜欢 stringr::str_extract 用于从字符串中提取模式。这使您可以简单地输入所需的模式,而不是试图替换其他所有内容:

mystring = c("Bloggs, Joe GBR London (1)/Bloggs, Joe London (2)", 
"Bloggs, Joe London (1)/Bloggs, Joe  GBR London (2)"  ,
"Bloggs, Joe London (1)/Bloggs, Joe London (2)" ,
"Bloggs, Joe GBR London (1)/Bloggs, Joe GBR London (2)", 
 "Bloggs, J-S GBR London (1)/Bloggs, J-S GBR London (2)" 
)

## extract first matches
stringr::str_extract(mystring, "[A-Z]{3}")
# [1] "GBR" "GBR" NA    "GBR" "GBR"

## or get all matches with `str_extract_all`
stringr::str_extract_all(mystring, "[A-Z]{3}")
# [[1]]
# [1] "GBR"
# 
# [[2]]
# [1] "GBR"
# 
# [[3]]
# character(0)
# 
# [[4]]
# [1] "GBR" "GBR"
# 
# [[5]]
# [1] "GBR" "GBR"

可以在基本R中使用 substring 或 regmatches 和 regexpr as seen in answers here .

推荐文章

Marc B. · 使用ggplot2创建条形图时“缺少值”

1 年前

deschen · tidyverse与外部向量发生突变,该外部向量的元素是数据帧中的列值

1 年前

Laura · 在Shiny中使用可排序的包拖放名称,这些名称将成为图表

1 年前

Mallikarjun M · 如何使用随机森林进行时间序列预测?

1 年前

ly li · 模型摘要:当表格形状改变时,拟合优度消失

1 年前

C.Robin · 将marginaffects::predictions()的结果连接回main df?

1 年前

monotonic · 如何将格式为“col1+col3+col4”的数据帧的行名转换为一列数字向量“c(1,3,4)”?

2 年前

Shawn Hemelstrand · 为什么我的自定义errorbar函数不能在R中工作?

2 年前

RoyBatty · 统计每个字符在整个数据集中出现的次数

2 年前

stats_noob · R: 记录某个“行为”发生的循环的索引?

2 年前