我有一列名字;有些有中间名或中间首字母。我想把那些中间的首字母从
fullname
列,并在该列旁边创建一个新列来存储这些中间名/首字母。
根据我的研究
this post
提供了一些删除中间名的解决方案。如何将这些中间名/首字母移动到新列?
以下是我使用的40行数据示例
dput()
数据有4300万行,大约3.8 GB。
data<- structure(list(id = c(439116595, 439317458, 439373574, 439434694,
439508848, 439632143, 439778306, 439917155, 440009485, 440147556,
440207880, 441479247, 442115059, 441569787, 438192228, 438215998,
438307365, 438317476, 438389110, 438409963, 438479736, 438509859,
438634859, 438662407, 438764944, 438846700, 438884094, 438954147,
439227370, 439243020, 439248564, 439272667, 439357884, 439403127,
439446363, 439511276, 439546441, 439586141, 439804213, 439862550,
439889286), fullname = c("Shawn Chase", "Steven Hofer", "Stephen Paradise",
"Shengho Yang", "Nelson Carvalho", "RICK GILLICK", "Marie Jhoanne Morilla",
"Sanjay Kulkarni", "Sam Bunn", "Iran Murphy", "Kathryn Cutler",
"Diane Sik", "Donna Yee", "Christine Coltrain", "Maher Dakkouri",
"Ray Perl", "Abid Khalil", "Ian Crombie", "Allen Carr", "Daniel Angeline",
"Jimmy Tan", "Thierry LAMBERT", "Diene Faye", "Greg Greene",
"Laura Holsopple", "Roberta Minkus", "Bridget Chenette", "Joshua Polite",
"John Liberty", "David Smith", "Igor Baratta", "Pierre Schmitz",
"alejandra salvanes", "Malcolm K Knight", "Xiaoyan Hu", "Joe Pawl",
"Bryan Armstrong", "Christina Spezio", "Robert Gibson", "Peter Head",
"Mike Russo"), degree = c("Bachelor", "Bachelor", "MBA",
"", "", "", "", "Master", "", "MBA", "", "", "Bachelor", "Doctor",
"Bachelor", "Master", "Master", "", "", "", "", "", "", "Bachelor",
"Master", "", "Master", "", "Bachelor", "Master", "", "", "",
"", "Bachelor", "Associate", "", "Bachelor", "", "", "Associate"
)), row.names = 20:60, class = "data.frame")
我在一小部分数据上尝试了以下方法。但是,出现了一条错误消息:
警告信息:
在stri_match_first_regex(字符串,模式,opts_regex=opts(模式))中:参数不是原子向量;胁迫
str_match(data, '^(\\S+)\\s*(.*?)\\s*(\\S+)$')[,-1]