代码之家  ›  专栏  ›  技术社区  ›  Mona Jalal

R中的数据清理

  •  1
  • Mona Jalal  · 技术社区  · 10 年前

    我有一个csv文件,我只想提取包含以下内容的句子的时间戳 toward 加上那句话中的水果名。我如何在R中做到这一点(或者如果有更快的方法,那是什么?)

    1438293900729698553,robot is in motion toward [strawberry]
    1438293900730571638,Found a plan for avocado in 1.36400008202 seconds
    1438293900731434815,current probability is greater than EXECUTION_THRESHOLD
    1438293900731554567,ready to execute am original plan of len = 33
    1438293900731586463,len of sub plan 1 = 24
    1438293900731633713,len of sub plan 2 = 9
    1438293900732910799,put in an execution request; now updating the dict
    1438293900732949576,current_prediciton_item = avocado
    1438293900733070339,current_item_probability = 0.880086981207
    1438293901677787230,current probability is greater than PLANNING_THRESHOLD
    1438293901681590725,robot is in motion toward [avocado]
    1438293902689233770,we have received verbal request [avocado]
    1438293902689314002,we already have a plan for the verbal request
    1438293902689377800,debug
    1438293902690529516,put in the final motion request
    1438293902691076051,Found a plan for avocado in 1.95595788956 seconds
    1438293902691084147,current predicted item != motion target; calc a new plan
    1438293902691110642,current probability is greater than EXECUTION_THRESHOLD
    1438293902691885974,have existing requests
    1438293904496769068,robot is in motion toward [avocado]
    1438293907737142498,ready to pick up the item
    

    理想情况下,我希望输出如下:

    1438293900729698553, strawberry
    1438293901681590725, avocado
    1438293904496769068, avocado
    
    1 回复  |  直到 10 年前
        1
  •  2
  •   Rich Scriven    10 年前

    试试看,在哪里 filename 是文件的名称。

    g <- grep("toward", readLines(filename), fixed = TRUE, value = TRUE)
    gsub("((?<=,).*\\[)|\\]", "", g, perl = TRUE)
    # [1] "1438293900729698553,strawberry" "1438293901681590725,avocado"   
    # [3] "1438293904496769068,avocado"