![]() |
1
18
我认为这是coursera文本挖掘任务之一。你可以使用正则表达式和抽取来得到解。 dates.txt
输出: 9 1971-04-10 84 1971-05-18 2 1971-07-08 53 1971-07-11 28 1971-09-12 474 1972-01-01 153 1972-01-13 13 1972-01-26 129 1972-05-06 98 1972-05-13 111 1972-06-10 225 1972-06-15 31 1972-07-20 171 1972-10-04 191 1972-11-30 486 1973-01-01 335 1973-02-01 415 1973-02-01 36 1973-02-14 405 1973-03-01 323 1973-03-01 422 1973-04-01 375 1973-06-01 380 1973-07-01 345 1973-10-01 57 1973-12-01 481 1974-01-01 436 1974-02-01 104 1974-02-24 299 1974-03-01
如果只想返回索引,那么
第一个正则表达式的解析 #?: Non-capturing group ((?:\d{,2}\s)? # The two digits group. `?` refers to preceding token or group. Here the digits of 2 or 1 and space occurring once or less. (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]* # The words in group ending with any letters `[]` occuring any number of times (`*`). (?:-|\.|\s|,) # Pattern matching -,.,space \s? #(`?` here it implies only to space i.e the preceding token) \d{,2}[a-z]* # less than or equal to two digits having any number of letters at the end (`*`). (Eg: may be 1st, 13th , 22nd , Jan , December etc ) . (?:-|,|\s)?# The characters -/,/space may occur once and may not occur because of `?` at the end \s? # space may occur or may not occur at all (maximum is 1) (`?` here it refers only to space) \d{2,4}) # Match digit which is 2 or 4
|
![]() |
user1245262 · 筛选Pandas数据帧时出现问题 1 年前 |
|
Foroand · 熊猫数据帧中的词频计数耗时过长 1 年前 |
![]() |
user14696236 · 如何为每个对应的列创建一行[重复] 2 年前 |
![]() |
The Great · 拆分并存储数据帧,但名称基于特定列中的唯一值 2 年前 |
![]() |
nickolakis · 基于R中的列名复制列 2 年前 |
![]() |
A. Handler · 有没有办法将数据帧的列与完整列名向量相匹配? 2 年前 |