代码之家  ›  专栏  ›  技术社区  ›  user366312

保存*.csv,不带空格和双引号(2)

  •  0
  • user366312  · 技术社区  · 2 年前

    我想保存没有双引号的CSV数据。

    # Write the data sets to CSV files
    training_set.to_csv(os.path.join(output_dir, target_dir, 'training_set.csv'), index=False, header=False)
    validation_set.to_csv(os.path.join(output_dir, target_dir, 'validation_set.csv'), index=False, header=False)
    test_set.to_csv(os.path.join(output_dir, target_dir, 'test_set.csv'), index=False, header=False)
    

    上面的源代码使用双引号保存数据:

    ... ... ...
    "76,GLU,H,5.406,5.079,6.304,0,0,0,1,1,0"
    "172,THR,H,6.651,8.414,9.157,0,0,0,0,0,0"
    "238,GLU,C,5.764,9.526,11.865,0,0,0,0,2,0"
    "133,LYS,C,7.412,9.808,11.162,0,0,0,0,1,0"
    "247,ASP,C,5.351,6.6,9.927,0,0,0,2,4,0"
    "133,GLU,H,5.498,5.134,6.529,0,0,0,0,1,0"
    "111,GLN,C,6.674,9.082,9.925,0,0,0,0,1,0"
    "374,SER,C,6.642,8.332,11.536,0,0,0,0,1,0"
    "192,SER,C,6.346,8.69,12.18,0,1,2,4,9,0"
    "182,LEU,H,5.453,7.862,8.894,0,0,0,0,4,0"
    ... ... ...
    

    如果我写以下内容,那么编译器会要求一个转义符:

    # Write the data sets to CSV files
    training_set.to_csv(os.path.join(output_dir, target_dir, 'training_set.csv'), index=False, header=False, quoting=csv.QUOTE_NONE)
    validation_set.to_csv(os.path.join(output_dir, target_dir, 'validation_set.csv'), index=False, header=False, quoting=csv.QUOTE_NONE)
    test_set.to_csv(os.path.join(output_dir, target_dir, 'test_set.csv'), index=False, header=False, quoting=csv.QUOTE_NONE)
    

    然后,我写了以下内容,出现了错误:

    # Write the data sets to CSV files
    training_set.to_csv(os.path.join(output_dir, target_dir, 'training_set.csv'), index=False, header=False, quoting=csv.QUOTE_NONE, escapechar='')
    validation_set.to_csv(os.path.join(output_dir, target_dir, 'validation_set.csv'), index=False, header=False, quoting=csv.QUOTE_NONE, escapechar='')
    test_set.to_csv(os.path.join(output_dir, target_dir, 'test_set.csv'), index=False, header=False, quoting=csv.QUOTE_NONE, escapechar='')
    
    Traceback (most recent call last):
      File "data_split_segregate.py", line 87, in <module>
        training_set.to_csv(os.path.join(output_dir, target_dir, 'training_set.csv'), index=False, header=False, quoting=csv.QUOTE_NONE, escapechar='')
      File "/home/my_username/heca_v2/env/lib/python3.7/site-packages/pandas/core/generic.py", line 3482, in to_csv
        storage_options=storage_options,
      File "/home/my_username/heca_v2/env/lib/python3.7/site-packages/pandas/io/formats/format.py", line 1105, in to_csv
        csv_formatter.save()
      File "/home/my_username/heca_v2/env/lib/python3.7/site-packages/pandas/io/formats/csvs.py", line 257, in save
        self._save()
      File "/home/my_username/heca_v2/env/lib/python3.7/site-packages/pandas/io/formats/csvs.py", line 262, in _save
        self._save_body()
      File "/home/my_username/heca_v2/env/lib/python3.7/site-packages/pandas/io/formats/csvs.py", line 300, in _save_body
        self._save_chunk(start_i, end_i)
      File "/home/my_username/heca_v2/env/lib/python3.7/site-packages/pandas/io/formats/csvs.py", line 316, in _save_chunk
        self.writer,
      File "pandas/_libs/writers.pyx", line 72, in pandas._libs.writers.write_csv_rows
    _csv.Error: need to escape, but no escapechar set
    
    

    如何解决此问题?

    注意: 源数据没有双引号。

    1 回复  |  直到 2 年前
        1
  •  1
  •   Nick SamSmith1986    2 年前

    看起来,您的源数据是一列逗号分隔的数据;类似于:

                                           text
    0    76,GLU,H,5.406,5.079,6.304,0,0,0,1,1,0
    1   172,THR,H,6.651,8.414,9.157,0,0,0,0,0,0
    2  238,GLU,C,5.764,9.526,11.865,0,0,0,0,2,0
    3  133,LYS,C,7.412,9.808,11.162,0,0,0,0,1,0
    4     247,ASP,C,5.351,6.6,9.927,0,0,0,2,4,0
    5   133,GLU,H,5.498,5.134,6.529,0,0,0,0,1,0
    6   111,GLN,C,6.674,9.082,9.925,0,0,0,0,1,0
    7  374,SER,C,6.642,8.332,11.536,0,0,0,0,1,0
    8    192,SER,C,6.346,8.69,12.18,0,1,2,4,9,0
    9   182,LEU,H,5.453,7.862,8.894,0,0,0,0,4,0
    

    如果是 df 然后 df.to_csv(index=False, header=False) 将产生您所看到的输出。

    要为该列中的每个值生成CSV列,您需要将其拆分 , expand 到新列。使用上面的示例df:

    df['text'].str.split(',', expand=True).to_csv(index=False, header=False)
    

    输出:

    76,GLU,H,5.406,5.079,6.304,0,0,0,1,1,0
    172,THR,H,6.651,8.414,9.157,0,0,0,0,0,0
    238,GLU,C,5.764,9.526,11.865,0,0,0,0,2,0
    133,LYS,C,7.412,9.808,11.162,0,0,0,0,1,0
    247,ASP,C,5.351,6.6,9.927,0,0,0,2,4,0
    133,GLU,H,5.498,5.134,6.529,0,0,0,0,1,0
    111,GLN,C,6.674,9.082,9.925,0,0,0,0,1,0
    374,SER,C,6.642,8.332,11.536,0,0,0,0,1,0
    192,SER,C,6.346,8.69,12.18,0,1,2,4,9,0
    182,LEU,H,5.453,7.862,8.894,0,0,0,0,4,0