代码之家  ›  专栏  ›  技术社区  ›  Fogarasi Norbert

在Python中,仅将新值从DataFrame追加到CSV

  •  0
  • Fogarasi Norbert  · 技术社区  · 6 年前

    Date,High,Low,Open,Close,Volume,Adj Close
    1980-12-12,0.515625,0.5133928656578064,0.5133928656578064,0.5133928656578064,117258400.0,0.02300705946981907
    1980-12-15,0.4888392984867096,0.4866071343421936,0.4888392984867096,0.4866071343421936,43971200.0,0.02180669829249382
    1980-12-16,0.453125,0.4508928656578064,0.453125,0.4508928656578064,26432000.0,0.02020619809627533
    

    我还有一个 Pandas DataFrame 它有完全相同的值,但也有新的条目。我的目标是只将新值追加到CSV文件中。

    我试过这样做,但不幸的是,这不仅附加了新条目,还附加了旧条目:

    df.to_csv('{}/{}'.format(FOLDER, 'AAPL.CSV'), mode='a', header=False)
    
    0 回复  |  直到 6 年前
        1
  •  0
  •   cullzie    6 年前

    您可以在写入csv文件后重新读取它,并在追加新获取的数据之前删除任何重复的文件。

    以下代码对我有效:

    import pandas as pd
    
    # Creating original csv
    columns = ['Date','High','Low','Open','Close','Volume','Adj Close']
    original_rows = [["1980-12-12",0.515625,0.5133928656578064,0.5133928656578064,0.5133928656578064,117258400.0,0.02300705946981907], ["1980-12-15",0.4888392984867096,0.4866071343421936,0.4888392984867096,0.4866071343421936,43971200.0,0.02180669829249382
    ]]
    df_original = pd.DataFrame(columns=columns, data=original_rows)
    df_original.to_csv('AAPL.CSV', mode='w', index=False)
    
    # Fetching the new data
    rows_updated = [["1980-12-12",0.515625,0.5133928656578064,0.5133928656578064,0.5133928656578064,117258400.0,0.02300705946981907], ["1980-12-15",0.4888392984867096,0.4866071343421936,0.4888392984867096,0.4866071343421936,43971200.0,0.02180669829249382
    ], ["1980-12-16",0.453125,0.4508928656578064,0.453125,0.4508928656578064,26432000.0,0.02020619809627533]]
    df_updated = pd.DataFrame(columns=columns, data=rows_updated)
    
    # Read in current csv values
    current_csv_data = pd.read_csv('AAPL.CSV')
    
    # Drop duplicates and append only new data
    new_entries = pd.concat([current_csv_data, df_updated]).drop_duplicates(subset='Date', keep=False)
    new_entries.to_csv('AAPL.CSV', mode='a', header=False, index=False)