Gsub有点快:
# one row
x <- rep("{some field=249 apples, y= m s 33 , url=https://go.s.com?id=7, source=multiC}", 1000000) # repeat for 1 mil rows for testing
reform <- function(x) {
gsub('([a-zA-Z0-9_ ]+)=([^,}]+)', '"\\1":"\\2"',
gsub('([{,])\\s*|\\s*([,}])', '\\1\\2', x))
}
但是
stringr::str_replace_all
更快:
library(stringr)
reform_stringr <- function(x) {
x <- str_replace_all(x, '([{,])\\s*|\\s*([,}])', '\\1\\2')
str_replace_all(x, '([a-zA-Z0-9_ ]+)=([^,}]+)', '"\\1":"\\2"')
}
1 mil行的测试结果
Unit: seconds
expr min lq mean median uq max neval cld
reform 6.168293 6.177294 6.236553 6.211335 6.265154 6.379521 10 a
reform_stringr 3.974893 3.990187 3.997749 3.994628 3.997775 4.040163 10 b
这样使用它
y <- sprintf("[%s]", toString(reform_stringr(x))) |>
jsonlite::fromJSON()
> head(y)
一些领域
|
y
|
url
|
来源
|
249个苹果
|
m s 33
|
https://go.s.com?id=7
|
multiC
|
249个苹果
|
m s 33
|
https://go.s.com?id=7
|
multiC
|
249个苹果
|
m s 33
|
https://go.s.com?id=7
|
multiC
|
249个苹果
|
m s 33
|
https://go.s.com?id=7
|
multiC
|
249个苹果
|
m s 33
|
https://go.s.com?id=7
|
multiC
|
249个苹果
|
m s 33
|
https://go.s.com?id=7
|
multiC
|