代码之家  ›  专栏  ›  技术社区  ›  MAK

从读取文件行在集合中查找元素

  •  0
  • MAK  · 技术社区  · 7 年前

    我有带分隔符的文本文件 | :文件1.txt

    ID|Name|Date
    1|A|2017-12-19   
    2|B|2017-12-20
    3|C|2017-12-21
    

    和跟随 SET : <type 'set'>

    id_set = set(['1','2'])
    date_set = set(['2017-12-19', '2017-12-20'])
    

    我只想找到从set到file的匹配元素,并将该记录从file1.txt写入output.txt。

    预期输出: Output.txt 应该得到以下数据,

    ID|Name|Date
    1|A|2017-12-19   
    2|B|2017-12-20
    
    2 回复  |  直到 7 年前
        1
  •  3
  •   RoadRunner    7 年前

    id_set = {'1','2'}
    date_set = {'2017-12-19', '2017-12-20'}
    
    # open files for reading and writing
    with open('file.txt') as in_file, open('output.txt', 'w') as out_file:
    
        # write headers
        out_file.write(next(in_file))
    
        # go over lines in file
        for line in in_file:
    
            # extract id and date
            id, _, date = line.rstrip().split('|')
    
            # keep lines have an id or date in the sets
            if id in id_set or date in date_set:
                out_file.write(line)
    

    ID|Name|Date
    1|A|2017-12-19
    2|B|2017-12-20
    
        2
  •  2
  •   jpp    7 年前

    import pandas as pd
    from io import StringIO
    
    mystr = StringIO("""ID|Name|Date
    1|A|2017-12-19
    2|B|2017-12-20
    3|C|2017-12-21""")
    
    # replace mystr with 'file1.txt'
    df = pd.read_csv(mystr, sep='|')
    
    # criteria
    id_set = {'1', '2'}
    date_set = {'2017-12-19', '2017-12-20'}
    
    # apply criteria
    df2 = df[df['ID'].astype(str).isin(id_set) | df['Date'].isin(date_set)]
    
    print(df2)
    
    #   ID Name        Date
    # 0  1    A  2017-12-19
    # 1  2    B  2017-12-20
    
    # export to csv
    df2.to_csv('file1_out.txt', sep='|')