代码之家  ›  专栏  ›  技术社区  ›  slayedbylucifer

大熊猫用ohlc重采样

  •  1
  • slayedbylucifer  · 技术社区  · 6 年前

    我对熊猫不熟悉。所以如果我做了什么蠢事,就告诉我。

    输入文件: (仅) head 如下所示。文件有10多行)

    $ head /var/tmp/ticks_data.csv 
    2019-01-18 14:55:00,296
    2019-01-18 14:55:01,296
    2019-01-18 14:55:02,296
    2019-01-18 14:55:03,296.05
    2019-01-18 14:55:04,296.05
    2019-01-18 14:55:05,296
    2019-01-18 14:55:06,296
    2019-01-18 14:55:08,296
    2019-01-18 14:55:09,296
    2019-01-18 14:55:10,296.05
    

    代码:

    $ cat create_candles.py 
    
    import pandas as pd
    
    filename = '/var/tmp/ticks_data.csv'
    df = pd.read_csv(filename, names=['timestamp', 'ltp'], index_col=1, parse_dates=['timestamp'])
    # print(df.head())
    data = df['ltp'].resample('1min').ohlc()
    print(data)
    

    错误:

    $ python3 create_candles.py 
    Traceback (most recent call last):
      File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexes/base.py", line 3078, in get_loc
        return self._engine.get_loc(key)
      File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
      File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
      File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
      File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
    KeyError: 'ltp'
    

    我以为文件有未知字符,所以我运行 dos2unix /var/tmp/ticks_data.csv 但问题还是一样。

    如果我尝试移除 index_col=1, df :

    df = pd.read_csv(filename, names=['timestamp', 'ltp'], parse_dates=['timestamp'])
    

    然后我得到以下错误:

    Traceback (most recent call last):
      File "/Users/dheeraj.kabra/Desktop/Ticks/create_candles.py", line 6, in <module>
        data = df['ltp'].resample('1min').ohlc()
      File "/usr/local/lib/python3.7/site-packages/pandas/core/generic.py", line 7110, in resample
        base=base, key=on, level=level)
      File "/usr/local/lib/python3.7/site-packages/pandas/core/resample.py", line 1148, in resample
        return tg._get_resampler(obj, kind=kind)
      File "/usr/local/lib/python3.7/site-packages/pandas/core/resample.py", line 1276, in _get_resampler
        "but got an instance of %r" % type(ax).__name__)
    TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
    [Finished in 0.5s with exit code 1]
    

    任何解决这一问题的建议都是非常有用的。

    1 回复  |  直到 6 年前
        1
  •  1
  •   jezrael    6 年前

    变化 index_col 0 ['timestamp'] 用于将第一列转换为 DatatimeIndex :

    import pandas as pd
    
    temp=u"""2019-01-18 14:55:00,296
    2019-01-18 14:55:01,296
    2019-01-18 14:55:02,296
    2019-01-18 14:55:03,296.05
    2019-01-18 14:55:04,296.05
    2019-01-18 14:55:05,296
    2019-01-18 14:55:06,296
    2019-01-18 14:55:08,296
    2019-01-18 14:55:09,296
    2019-01-18 14:55:10,296.05"""
    #after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
    #df = pd.read_csv(pd.compat.StringIO(temp), sep=";", index_col=None, parse_dates=False)
    df = pd.read_csv(pd.compat.StringIO(temp), 
                     names=['timestamp', 'ltp'], 
                     index_col=0, 
                     parse_dates=['timestamp'])
    

    替代方案:

    df = pd.read_csv(pd.compat.StringIO(temp), 
                     names=['timestamp', 'ltp'], 
                     index_col=['timestamp'], 
                     parse_dates=['timestamp'])
    

    print (df)
                            ltp
    timestamp                  
    2019-01-18 14:55:00  296.00
    2019-01-18 14:55:01  296.00
    2019-01-18 14:55:02  296.00
    2019-01-18 14:55:03  296.05
    2019-01-18 14:55:04  296.05
    2019-01-18 14:55:05  296.00
    2019-01-18 14:55:06  296.00
    2019-01-18 14:55:08  296.00
    2019-01-18 14:55:09  296.00
    2019-01-18 14:55:10  296.05
    
    data = df.resample('1min')['ltp'].ohlc()
    print(data)
                          open    high    low   close
    timestamp                                        
    2019-01-18 14:55:00  296.0  296.05  296.0  296.05
    

    细节 你原来的解决方案- index_col=1 分析第二列,这里 ltp :

    df = pd.read_csv(pd.compat.StringIO(temp), 
                     names=['timestamp', 'ltp'], 
                     index_col=1, 
                     parse_dates=['timestamp'])
    
    
    print (df)
                     timestamp
    ltp                       
    296.00 2019-01-18 14:55:00
    296.00 2019-01-18 14:55:01
    296.00 2019-01-18 14:55:02
    296.05 2019-01-18 14:55:03
    296.05 2019-01-18 14:55:04
    296.00 2019-01-18 14:55:05
    296.00 2019-01-18 14:55:06
    296.00 2019-01-18 14:55:08
    296.00 2019-01-18 14:55:09
    296.05 2019-01-18 14:55:10