代码之家  ›  专栏  ›  技术社区  ›  user288609

关于计算数据帧中每个组的摘要统计信息

  •  0
  • user288609  · 技术社区  · 6 年前

    我有一个数据帧,其中包含如下列

    ID                               Time                          Price
    1002                     1998-01-02                    34
    2001                     1998-02-03                   45
    1002                     1998-04-05                    23
    2003                      1998-02-03                   30
    1002                       1998-02-03                   60
    

    基于这个数据框,我想创建另一个数据框,它有三列, ID , period-1 , period-2 . 每个条目是相应时间段内ID的平均值)

    ID                period-1(1998-01-01:1998-02-01)             period 2(1998-02-02-1998-05-02)
    1002     
    2001
    2003         
    

    这是我按照建议得到的代码,但有一些错误

    import pandas as pd
    
    df=pd.DataFrame({"ID": ["1002", "2001", "1002", "2003", "1002"],
                    "Time": ["1998-01-02", "1998-02-03", "1998-04-05", "1998-02-03", "1998-02-03"],
                     "Price": ["34", "45", "23", "30","60"]})
    
    
    df.Time=pd.to_datetime(df.Time)
    period2=pd.Interval(pd.Timestamp('1998-02-02'), pd.Timestamp('1998-05-02'), closed='both')
    df['Price'].apply(pd.to_numeric)
    df['New']='period1'
    
    
    df.loc[df.Time.apply(lambda x : x in period2),'New']='period2'
    
    
    df.pivot_table(index='ID',columns='New',values='Price',aggfunc='mean')
    
    
     306             # people may try to aggregate on a non-callable attribute
    
    ~\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in mean(self, *args, **kwargs)
       1304         nv.validate_groupby_func('mean', args, kwargs, ['numeric_only'])
       1305         try:
    -> 1306             return self._cython_agg_general('mean', **kwargs)
       1307         except GroupByError:
       1308             raise
    
    ~\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in _cython_agg_general(self, how, alt, numeric_only, min_count)
       3972                             min_count=-1):
       3973         new_items, new_blocks = self._cython_agg_blocks(
    -> 3974             how, alt=alt, numeric_only=numeric_only, min_count=min_count)
       3975         return self._wrap_agged_blocks(new_items, new_blocks)
       3976 
    
    ~\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in _cython_agg_blocks(self, how, alt, numeric_only, min_count)
       4044 
       4045         if len(new_blocks) == 0:
    -> 4046             raise DataError('No numeric types to aggregate')
       4047 
       4048         # reset the locs in the blocks to correspond to our
    
    DataError: No numeric types to aggregate
    
    1 回复  |  直到 6 年前
        1
  •  2
  •   BENY    6 年前

    通过使用 Interval pivot_table

    #df.Time=pd.to_datetime(df.Time)
    
    period2=pd.Interval(pd.Timestamp('1998-02-02'), pd.Timestamp('1998-05-02'), closed='both')
    
    
    df['New']='period1'
    
    df.loc[df.Time.apply(lambda x : x in period2),'New']='period2'
    
    df.pivot_table(index='ID',columns='New',values='Price',aggfunc='mean')
    Out[881]: 
    New   period1  period2
    ID                    
    1002     34.0     41.5
    2001      NaN     45.0
    2003      NaN     30.0