代码之家  ›  专栏  ›  技术社区  ›  Coolio2654

Statsmodels:在原始数据上用置信限覆盖ARIMA预测的困难

  •  1
  • Coolio2654  · 技术社区  · 7 年前

    我有一些股票数据,通过

    import quandl as qd
    api =  '1uRGReHyAEgwYbzkPyG3'
    qd.ApiConfig.api_key = api 
    
    data = qd.get_table('WIKI/PRICES', qopts={'columns': ['ticker', 'date', 'high', 'low', 'open', 'close']},
                        ticker=['AMZN'], date={'gte': '2000-01-01', 'lte': '2014-03-10'})
    
    data.reset_index(inplace=True, drop=True)
    
    price = pd.Series(data.iloc[:,2].values,index=pd.to_datetime(data.iloc[:,1]))
    

    对于statsmodels,我想绘制一个arima模型,显示以下内容:

    1. 原始数据,
    2. 与原始数据重叠的拟合值,以及
    3. 到指定距离的未来预测+置信区间。

    enter image description here

    上面的图片来自statsmodels文档 here ,但遵循他们的代码会给我带来奇怪的错误。

    fig, ax = plt.subplots()
    ax = price.loc['2012-01-03':].plot(ax=ax, label='observed')
    
    fig = model_fit.plot_predict('2014-01-03','2015-01-03', dynamic=False, ax=ax, plot_insample=False)
    
    plt.show()
    

    下面的错误,

    KeyError                                  Traceback (most recent call last)
    pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()
    
    pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
    
    pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
    
    KeyError: 1420243200000000000
    
    During handling of the above exception, another exception occurred:
    
    KeyError                                  Traceback (most recent call last)
    ~/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
       2524             try:
    -> 2525                 return self._engine.get_loc(key)
       2526             except KeyError:
    
    pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()
    
    pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()
    
    KeyError: Timestamp('2015-01-03 00:00:00')
    
    During handling of the above exception, another exception occurred:
    
    KeyError                                  Traceback (most recent call last)
    pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()
    
    pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
    
    pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
    
    KeyError: 1420243200000000000
    
    During handling of the above exception, another exception occurred:
    
    KeyError                                  Traceback (most recent call last)
    ~/anaconda3/lib/python3.6/site-packages/statsmodels/tsa/base/tsa_model.py in _get_predict_end(self, end)
        172             try:
    --> 173                 end = self._get_dates_loc(dates, dtend)
        174             except KeyError as err: # end is greater than dates[-1]...probably
    
    ~/anaconda3/lib/python3.6/site-packages/statsmodels/tsa/base/tsa_model.py in _get_dates_loc(self, dates, date)
         94     def _get_dates_loc(self, dates, date):
    ---> 95         date = dates.get_loc(date)
         96         return date
    
    ~/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in get_loc(self, key, method, tolerance)
       1425             key = Timestamp(key, tz=self.tz)
    -> 1426             return Index.get_loc(self, key, method, tolerance)
       1427 
    
    ~/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
       2526             except KeyError:
    -> 2527                 return self._engine.get_loc(self._maybe_cast_indexer(key))
       2528 
    
    pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()
    
    pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()
    
    KeyError: Timestamp('2015-01-03 00:00:00')
    
    During handling of the above exception, another exception occurred:
    
    ValueError                                Traceback (most recent call last)
    <ipython-input-206-505c74789333> in <module>()
          3 ax = price.loc['2012-01-03':].plot(ax=ax, label='observed')
          4 
    ----> 5 fig = model_fit.plot_predict('2014-01-03','2015-01-03', dynamic=False, ax=ax, plot_insample=False)
          6 
          7 plt.show()
    
    ~/anaconda3/lib/python3.6/site-packages/statsmodels/tsa/arima_model.py in plot_predict(self, start, end, exog, dynamic, alpha, plot_insample, ax)
       1885 
       1886         # use predict so you set dates
    -> 1887         forecast = self.predict(start, end, exog, 'levels', dynamic)
       1888         # doing this twice. just add a plot keyword to predict?
       1889         start = self.model._get_predict_start(start, dynamic=dynamic)
    
    ~/anaconda3/lib/python3.6/site-packages/statsmodels/tsa/arima_model.py in predict(self, start, end, exog, typ, dynamic)
       1808     def predict(self, start=None, end=None, exog=None, typ='linear',
       1809                 dynamic=False):
    -> 1810         return self.model.predict(self.params, start, end, exog, typ, dynamic)
       1811     predict.__doc__ = _arima_results_predict
       1812 
    
    ~/anaconda3/lib/python3.6/site-packages/statsmodels/tsa/arima_model.py in predict(self, params, start, end, exog, typ, dynamic)
       1184             if not dynamic:
       1185                 predict = super(ARIMA, self).predict(params, start, end, exog,
    -> 1186                                                      dynamic)
       1187 
       1188                 start = self._get_predict_start(start, dynamic)
    
    ~/anaconda3/lib/python3.6/site-packages/statsmodels/tsa/arima_model.py in predict(self, params, start, end, exog, dynamic)
        732         # will return an index of a date
        733         start = self._get_predict_start(start, dynamic)
    --> 734         end, out_of_sample = self._get_predict_end(end, dynamic)
        735         if out_of_sample and (exog is None and self.k_exog > 0):
        736             raise ValueError("You must provide exog for ARMAX")
    
    ~/anaconda3/lib/python3.6/site-packages/statsmodels/tsa/arima_model.py in _get_predict_end(self, end, dynamic)
       1062         Handling of inclusiveness should be done in the predict function.
       1063         """
    -> 1064         end, out_of_sample = super(ARIMA, self)._get_predict_end(end, dynamic)
       1065         if 'mle' not in self.method and not dynamic:
       1066             end -= self.k_ar
    
    ~/anaconda3/lib/python3.6/site-packages/statsmodels/tsa/arima_model.py in _get_predict_end(self, end, dynamic)
        673     def _get_predict_end(self, end, dynamic=False):
        674         # pass through so predict works for ARIMA and ARMA
    --> 675         return super(ARMA, self)._get_predict_end(end)
        676 
        677     def geterrors(self, params):
    
    ~/anaconda3/lib/python3.6/site-packages/statsmodels/tsa/base/tsa_model.py in _get_predict_end(self, end)
        177                     freq = self.data.freq
        178                     out_of_sample = datetools._idx_from_dates(dates[-1], dtend,
    --> 179                                             freq)
        180                 else:
        181                     if freq is None:
    
    ~/anaconda3/lib/python3.6/site-packages/statsmodels/tsa/base/datetools.py in _idx_from_dates(d1, d2, freq)
        100     return len(DatetimeIndex(start=_maybe_convert_period(d1),
        101                              end=_maybe_convert_period(d2),
    --> 102                              freq=_freq_to_pandas[freq])) - 1
        103 
        104 
    
    ~/anaconda3/lib/python3.6/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
        116                 else:
        117                     kwargs[new_arg_name] = new_arg_value
    --> 118             return func(*args, **kwargs)
        119         return wrapper
        120     return _deprecate_kwarg
    
    ~/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, closed, ambiguous, dtype, **kwargs)
        303 
        304         if data is None and freq is None:
    --> 305             raise ValueError("Must provide freq argument if no data is "
        306                              "supplied")
        307 
    
    ValueError: Must provide freq argument if no data is supplied
    

    我做错什么了?

    更新

    在chad fulton的建议之后,我尝试了a)以预先指定的频率下载数据,b)下载后手动更改原始数据的频率,c)将statsmodels更新为0.9并重试以上所有操作。

    A给出了错误“从已通过的日期推断的频率none与已通过的频率d不一致”,而B产生 NaN 在导致模型本身不运行的数据中,C更改了B的错误类型。

    我认为现在的情况是,由于没有频率可以应用到数据上,所以不知道如何生成未来日期不应该怪预测。在这种情况下,有没有人对如何在进行基本预测时尽可能多地利用金融时间序列的数据,至少对自动处理丢失数据的非状态空间模型,有什么实际的建议?

    1 回复  |  直到 6 年前
        1
  •  1
  •   cfulton    7 年前

    我的第一个答案,也许不是那么令人满意,但从长远来看可能更好,是建议您升级到statsmodel0.9,它已经彻底检查了日期/时间处理。这很可能解决你的问题。

    我的第二个答案是,通过确保日期索引有一个频率,您可以用statsmodels<0.9解决问题。看起来你的约会可能是每天一次(如果不是,你必须更改下面的内容以使用正确的 freq ,因此我建议您替换:

    price = pd.Series(data.iloc[:,2].values,index=pd.to_datetime(data.iloc[:,1]))

    使用:

    price = pd.Series(data.iloc[:,2].values, index=pd.DatetimeIndex(data.iloc[:,1], freq='D'))