代码之家  ›  专栏  ›  技术社区  ›  ah bon

在熊猫中为每组插入特定的日期行周期

  •  0
  • ah bon  · 技术社区  · 5 年前

    如何为每个组插入未来两个月的行 city district 到下面的数据帧?

      city district                     date  price
    0    a        c  2019-08-01 00:00:00.000     12
    1    a        c  2019-09-01 00:00:00.000     13
    2    a        c  2019-10-01 00:00:00.000     11
    3    a        c  2019-11-01 00:00:00.000     15
    4    b        d  2019-08-01 00:00:00.000      8
    5    b        d  2019-09-01 00:00:00.000      6
    6    b        d  2019-10-01 00:00:00.000      9
    7    b        d  2019-11-01 00:00:00.000     15
    

    所需的输出将是这样的。

      city district                     date  price
    0    a        c  2019-08-01 00:00:00.000     12
    1    a        c  2019-09-01 00:00:00.000     13
    2    a        c  2019-10-01 00:00:00.000     11
    3    a        c  2019-11-01 00:00:00.000     15
    4    a        c  2019-12-01 00:00:00.000      
    5    a        c  2020-01-01 00:00:00.000      
    6    b        d  2019-08-01 00:00:00.000      8
    7    b        d  2019-09-01 00:00:00.000      6
    8    b        d  2019-10-01 00:00:00.000      9
    9    b        d  2019-11-01 00:00:00.000     15
    10   b        d  2019-12-01 00:00:00.000      
    11   b        d  2020-01-01 00:00:00.000     
    
    1 回复  |  直到 5 年前
        1
  •  1
  •   Henry Yik    5 年前

    set_index date 然后 reindex 按频率 MS :

    print (df.set_index("date").groupby(["city","district"])
           .apply(lambda d: d[["price"]].reindex(pd.date_range(min(df["date"]),max(df["date"])+pd.DateOffset(months=2),freq="MS")))
           .reset_index())
    

    或者创造 MultiIndex 来自以下组合 city , district 日期 :

    month_range = pd.date_range(min(df["date"]),max(df["date"])+pd.DateOffset(months=2),freq="MS")
    
    combos = [(*k,d) for k in df.groupby(["city","district"]).groups.keys() for d in month_range ]
    
    m_index = pd.MultiIndex.from_tuples(combos,names=["city","district","date"])
    
    print (df.set_index(["city","district","date"]).reindex(m_index).reset_index())
    

    两者产生相同的结果:

       city district    level_2  price
    0     a        c 2019-08-01   12.0
    1     a        c 2019-09-01   13.0
    2     a        c 2019-10-01   11.0
    3     a        c 2019-11-01   15.0
    4     a        c 2019-12-01    NaN
    5     a        c 2020-01-01    NaN
    6     b        d 2019-08-01    8.0
    7     b        d 2019-09-01    6.0
    8     b        d 2019-10-01    9.0
    9     b        d 2019-11-01   15.0
    10    b        d 2019-12-01    NaN
    11    b        d 2020-01-01    NaN