代码之家  ›  专栏  ›  技术社区  ›  Conner M.

如何在单个函数中完成几个字符向量格式化步骤?

  •  1
  • Conner M.  · 技术社区  · 8 年前

    已编辑

    我有一个简单的列名列表,我想更改其格式,最好是通过编程。以下是列表示例:

        vars_list <- c("tBodyAcc.mean...X", "tBodyAcc.mean...Y", "tBodyAcc.mean...Z",
        "tBodyAcc.std...X", "tBodyAcc.std...Y", "tBodyAcc.std...Z", 
        "tGravityAcc.mean...X", "tGravityAcc.mean...Y", "tGravityAcc.mean...Z",
        "tGravityAcc.std...X", "tGravityAcc.std...Y", "tGravityAcc.std...Z",
        "fBodyAcc.mean...X", "fBodyAcc.mean...Y", "fBodyAcc.mean...Z", 
        "fBodyAcc.std...X", "fBodyAcc.std...Y", "fBodyAcc.std...Z",
        "fBodyAccJerk.mean...X", "fBodyAccJerk.mean...Y", "fBodyAccJerk.mean...Z",
        "fBodyAccJerk.std...X", "fBodyAccJerk.std...Y", "fBodyAccJerk.std...Z")
    

    这是我希望的结果:

     [3]"Time_Body_Acc_Mean_X"                "Time_Body_Acc_Mean_Y"               
     [5] "Time_Body_Acc_Mean_Z"                "Time_Body_Acc_Stddev_X"             
     [7] "Time_Body_Acc_Stddev_Y"              "Time_Body_Acc_Stddev_Z"             
     [9] "Time_Gravity_Acc_Mean_X"             "Time_Gravity_Acc_Mean_Y"            
    [11] "Time_Gravity_Acc_Mean_Z"             "Time_Gravity_Acc_Stddev_X"          
    [13] "Time_Gravity_Acc_Stddev_Y"           "Time_Gravity_Acc_Stddev_Z"
    

    ...

    [43] "Freq_Body_Acc_Mean_X"                "Freq_Body_Acc_Mean_Y"               
    [45] "Freq_Body_Acc_Mean_Z"                "Freq_Body_Acc_Stddev_X"             
    [47] "Freq_Body_Acc_Stddev_Y"              "Freq_Body_Acc_Stddev_Z"             
    [49] "Freq_Body_Acc_Jerk_Mean_X"           "Freq_Body_Acc_Jerk_Mean_Y"          
    [51] "Freq_Body_Acc_Jerk_Mean_Z"           "Freq_Body_Acc_Jerk_Stddev_X"        
    [53] "Freq_Body_Acc_Jerk_Stddev_Y"         "Freq_Body_Acc_Jerk_Stddev_Z" 
    

    我用正则表达式编写了一种非常冗长的更改方法。

    vars_list <- unlist(lapply(vars_list, function(x){gsub("^t", "Time", x)}))
    vars_list <- unlist(lapply(vars_list, function(x){gsub("^f", "Freq", x)}))
    vars_list <- unlist(lapply(vars_list, function(x){gsub("std", "Stddev", x)}))
    vars_list <- unlist(lapply(vars_list, function(x){gsub("mean", "Mean", x)}))
    vars_list <- unlist(lapply(vars_list, function(x){gsub("\\.+", "", x)}))
    vars_list <- unlist(lapply(vars_list, function(x){gsub("\\.", "", x)}))
    vars_list <- unlist(lapply(vars_list, 
                               function(x){gsub("(?<=[a-z]).{0}(?=[A-Z])",
                                                "_", x, perl = TRUE)}))
    

    有没有办法通过在单个函数调用中包含两个或多个格式化步骤来更高效、更优雅地获得相同的结果?

    2 回复  |  直到 8 年前
        1
  •  3
  •   akuiper    8 年前

    另一种选择是写你的 patterns replacement 在两个向量中,然后使用 stringi::stri_replace_all_regex 它可以以矢量化的方式进行替换:

    # patterns correspond to replacement at the same positions
    patterns <- c('^t', '^f', 'std', 'mean', '\\.+', '(?<=[a-z])([A-Z])')
    replacement <- c('Time', 'Freq', 'Stddev', 'Mean', '', '_$1')
    
    library(stringi)
    stri_replace_all_regex(vars_list, patterns, replacement, vectorize_all = F)
    # [1] "Time_Body_Acc_Mean_X"      "Time_Body_Acc_Mean_Y"     
    # [3] "Time_Body_Acc_Mean_Z"      "Time_Body_Acc_Stddev_X"   
    # [5] "Time_Body_Acc_Stddev_Y"    "Time_Body_Acc_Stddev_Z"   
    # [7] "Time_Gravity_Acc_Mean_X"   "Time_Gravity_Acc_Mean_Y"  
    # [9] "Time_Gravity_Acc_Mean_Z"   "Time_Gravity_Acc_Stddev_X"
    #[11] "Time_Gravity_Acc_Stddev_Y" "Time_Gravity_Acc_Stddev_Z"
    
        2
  •  0
  •   Maurits Evers    8 年前

    这个用R基怎么样 sub ?

    sub("t(\\w+)(Acc)\\.(\\w+)\\.+([XYZ])", "Time_\\1_\\2_\\3_\\4", vars_list);
    #[1] "Time_Body_Acc_mean_X"    "Time_Body_Acc_mean_Y"
    #[3] "Time_Body_Acc_mean_Z"    "Time_Body_Acc_std_X"
    #[5] "Time_Body_Acc_std_Y"     "Time_Body_Acc_std_Z"
    #[7] "Time_Gravity_Acc_mean_X" "Time_Gravity_Acc_mean_Y"
    #[9] "Time_Gravity_Acc_mean_Z" "Time_Gravity_Acc_std_X"
    #[11] "Time_Gravity_Acc_std_Y"  "Time_Gravity_Acc_std_Z"
    

    改变 mean Mean std StdDev 需要另外两个 附属的 s 同上 t Time f Freq .