代码之家  ›  专栏  ›  技术社区  ›  Jungar

如何离散具有超限持续时间的时间序列?

  •  0
  • Jungar  · 技术社区  · 4 年前

    我正在尝试将数据帧离散化,如下所示:

    开始日期 停车时间(分钟) 充电时间(分钟) 能量(kWh)
    2016-01-01 11:48:00 230 92
    2016-01-01 13:43:00 225 225 12.427662
    49732 2016-01-01 22:43:00 708 10.752058
    49736 2016-01-02 07:09:00 149 11.160776
    2016-01-02 10:29:00 156 156 10.298505
    49758 2016-01-02 13:06:00 84 2.904127
    49768 2016-01-02 15:00:00 27
    49773 2016-01-02 15:31:00 174 152 14.961943
    49775 2016-01-02 16:01:00 195 16.317518
    49790 108
    49791 2016-01-02 19:56:00 289 26 2.552439
    49802 58 58 5.243358
    49803 264
    240 0.008115
    49825 2016-01-03 14:12:00 97 5.29069
    49833 2016-01-03 15:52:00 201 16.058235
    49834 2016-01-03 15:52:00 5.304866
    2016-01-03 17:27:00 890 15.878921
    49857 198 127 6.368932
    2016-01-04 08:48:00 75 74

    我想做的是在2个小时的时间段内对其进行采样,如下所示:

    开始日期 能量(kWh) 充电时间(分钟) 费用
    3.034643 92 0
    2016-01-01 12:00:00 12.427662 0
    2016-01-01 14:00:00 0 0
    2016-01-01 16:00:00 0 0
    2016-01-01 18:00:00 0 0
    2016-01-01 20:00:00 0 0
    2016-01-01 22:00:00 10.752058 111 0
    2016-01-02 00:00:00 0 0 0
    2016-01-02 02:00:00 0
    2016-01-02 04:00:00 0 0

    我就这么做了

    data.resample('2H', on='Start Date').agg(({'Energy (kWh)':'sum','Charge Duration (mins)':'sum'}))
    

    但是问题是数据中存在溢出,正如您从第一行看到的,充电持续时间是92分钟。然而,这92分钟中只有12分钟是在10:00:00-12:00:00的时间段内,然而我使用重采样的方式将所有充电持续时间分配给了这个时间段。我想要的行为是根据开始日期和充电持续时间将它们“平均”地划分在时间段中,这样12分钟进入第一个时隙,剩下的80分钟进入下一个时隙。也有电动汽车充电超过3个时期的例子。我希望这有道理。

    原始数据以逗号分隔:

    49698,2016-01-01 11:48:00, 230 ,92.0,3.034643 49732,2016-01-01 22:43:00, 708 ,111.0,10.752058 49745,2016-01-02 10:29:00, 156 ,156.0,10.298505 49758,2016-01-02 13:06:00, 84 ,84.0,2.904127 49773,2016-01-02 15:31:00, 174 ,152.0,14.961943 49775,2016-01-02 16:01:00, 195 ,167.0,16.317518 49790,2016-01-02 19:37:00, 108 ,108.0,10.829344 49791,2016-01-02 19:56:00, 289 ,26.0,2.552439 49802,2016-01-03 09:23:00, 58 ,58.0,5.243358 49813,2016-01-03 11:12:00, 240 ,0.0,0.008115 49825,2016-01-03 14:12:00, 97 ,96.0,5.29069 49840,2016-01-03 17:27:00, 890 ,219.0,15.878921 49857,2016-01-04 05:57:00, 198 ,127.0,6.368932 49871,2016-01-04 08:48:00, 75 ,74.0,5.99877

    0 回复  |  直到 4 年前
        1
  •  1
  •   Rob Raymond    4 年前

    我不认为这是一个完全直接的方法。有效地为每一行构建了一个数据帧,并使用比率在目标行之间分割值。

    import io
    df = pd.read_csv(io.StringIO("""    Start Date  Park Duration (mins)    Charge Duration (mins)  Energy (kWh)
    49698   2016-01-01 11:48:00 230 92.0    3.034643
    49710   2016-01-01 13:43:00 225 225.0   12.427662
    49732   2016-01-01 22:43:00 708 111.0   10.752058
    49736   2016-01-02 07:09:00 149 149.0   11.160776
    49745   2016-01-02 10:29:00 156 156.0   10.298505
    49758   2016-01-02 13:06:00 84  84.0    2.904127
    49768   2016-01-02 15:00:00 27  26.0    2.573858
    49773   2016-01-02 15:31:00 174 152.0   14.961943
    49775   2016-01-02 16:01:00 195 167.0   16.317518
    49790   2016-01-02 19:37:00 108 108.0   10.829344
    49791   2016-01-02 19:56:00 289 26.0    2.552439
    49802   2016-01-03 09:23:00 58  58.0    5.243358
    49803   2016-01-03 09:33:00 264 134.0   6.782309
    49813   2016-01-03 11:12:00 240 0.0 0.008115
    49825   2016-01-03 14:12:00 97  96.0    5.29069
    49833   2016-01-03 15:52:00 201 201.0   16.058235
    49834   2016-01-03 15:52:00 53  52.0    5.304866
    49840   2016-01-03 17:27:00 890 219.0   15.878921
    49857   2016-01-04 05:57:00 198 127.0   6.368932
    49871   2016-01-04 08:48:00 75  74.0    5.99877"""), sep="\t", index_col=0)
    
    df["Start Date"] = pd.to_datetime(df["Start Date"])
    
    def proportionalsplit(s, freq="2H"):
        st = s["Start Date"]
        et = st + pd.Timedelta(minutes=s["Charge Duration (mins)"])
        tr = pd.date_range(st.floor(freq), et, freq=freq)
        lmin = {"2H":120}
        # ratio of how numeric values should be split across new buckets
        ratio = np.minimum((np.where(tr<st, tr.shift()-st, et-tr)/(10**9*60)).astype(int), np.full(len(tr),lmin[freq]))
        ratio = ratio / ratio.sum()
        return {"Start Date":tr, "Original Duration":np.full(len(tr), s["Charge Duration (mins)"]), 
                "Original Start":np.full(len(tr), s["Start Date"]), 
                "Original Index": np.full(len(tr), s.name),
                "Charge Duration (mins)": s["Charge Duration (mins)"] * ratio,
                "Energy (kWh)": s["Energy (kWh)"] * ratio,
               }
    
    df2 = pd.concat([pd.DataFrame(v) for v in df.apply(proportionalsplit, axis=1).values]).reset_index(drop=True)
    # everything OK?
    print(df2["Energy (kWh)"].sum().round(3)==df["Energy (kWh)"].sum().round(3), 
         df2["Charge Duration (mins)"].sum().round(3)==df["Charge Duration (mins)"].sum().round(3),)
    
    # let's have a look at everything in 2H resample...
    df3 = df2.groupby(["Start Date"]).agg({**{c:lambda s: list(s) for c in df2.columns if "Original" in c},
                                    **{c:"sum" for c in ["Charge Duration (mins)","Energy (kWh)"]}})
    
    

                                   Original Duration                                                                        Original Start                Original Index  Charge Duration (mins)  Energy (kWh)
    Start Date                                                                                                                                                                                                
    2016-01-01 10:00:00                       [92.0]                                                                 [2016-01-01 11:48:00]                       [49698]                    12.0      0.395823
    2016-01-01 12:00:00                [92.0, 225.0]                                            [2016-01-01 11:48:00, 2016-01-01 13:43:00]                [49698, 49710]                    97.0      3.577799
    2016-01-01 14:00:00                      [225.0]                                                                 [2016-01-01 13:43:00]                       [49710]                   120.0      6.628086
    2016-01-01 16:00:00                      [225.0]                                                                 [2016-01-01 13:43:00]                       [49710]                    88.0      4.860597
    2016-01-01 22:00:00                      [111.0]                                                                 [2016-01-01 22:43:00]                       [49732]                    77.0      7.458635
    2016-01-02 00:00:00                      [111.0]                                                                 [2016-01-01 22:43:00]                       [49732]                    34.0      3.293423
    2016-01-02 06:00:00                      [149.0]                                                                 [2016-01-02 07:09:00]                       [49736]                    51.0      3.820131
    2016-01-02 08:00:00                      [149.0]                                                                 [2016-01-02 07:09:00]                       [49736]                    98.0      7.340645
    2016-01-02 10:00:00                      [156.0]                                                                 [2016-01-02 10:29:00]                       [49745]                    91.0      6.007461
    2016-01-02 12:00:00                [156.0, 84.0]                                            [2016-01-02 10:29:00, 2016-01-02 13:06:00]                [49745, 49758]                   119.0      6.157983
    2016-01-02 14:00:00          [84.0, 26.0, 152.0]                       [2016-01-02 13:06:00, 2016-01-02 15:00:00, 2016-01-02 15:31:00]         [49758, 49768, 49773]                    85.0      6.465627
    2016-01-02 16:00:00               [152.0, 167.0]                                            [2016-01-02 15:31:00, 2016-01-02 16:01:00]                [49773, 49775]                   239.0     23.439513
    2016-01-02 18:00:00  [152.0, 167.0, 108.0, 26.0]  [2016-01-02 15:31:00, 2016-01-02 16:01:00, 2016-01-02 19:37:00, 2016-01-02 19:56:00]  [49773, 49775, 49790, 49791]                    78.0      7.684299
    2016-01-02 20:00:00                [108.0, 26.0]                                            [2016-01-02 19:37:00, 2016-01-02 19:56:00]                [49790, 49791]                   107.0     10.682851
    2016-01-03 08:00:00                [58.0, 134.0]                                            [2016-01-03 09:23:00, 2016-01-03 09:33:00]                [49802, 49803]                    64.0      4.711485
    2016-01-03 10:00:00           [58.0, 134.0, 0.0]                       [2016-01-03 09:23:00, 2016-01-03 09:33:00, 2016-01-03 11:12:00]         [49802, 49803, 49813]                   128.0      7.322297
    2016-01-03 14:00:00          [96.0, 201.0, 52.0]                       [2016-01-03 14:12:00, 2016-01-03 15:52:00, 2016-01-03 15:52:00]         [49825, 49833, 49834]                   112.0      6.745957
    2016-01-03 16:00:00         [201.0, 52.0, 219.0]                       [2016-01-03 15:52:00, 2016-01-03 15:52:00, 2016-01-03 17:27:00]         [49833, 49834, 49840]                   197.0     16.468453
    2016-01-03 18:00:00               [201.0, 219.0]                                            [2016-01-03 15:52:00, 2016-01-03 17:27:00]                [49833, 49840]                   193.0     14.532874
    2016-01-03 20:00:00                      [219.0]                                                                 [2016-01-03 17:27:00]                       [49840]                    66.0      4.785428
    2016-01-04 04:00:00                      [127.0]                                                                 [2016-01-04 05:57:00]                       [49857]                     3.0      0.150447
    2016-01-04 06:00:00                      [127.0]                                                                 [2016-01-04 05:57:00]                       [49857]                   120.0      6.017889
    2016-01-04 08:00:00                [127.0, 74.0]                                            [2016-01-04 05:57:00, 2016-01-04 08:48:00]                [49857, 49871]                    76.0      6.037237
    2016-01-04 10:00:00                       [74.0]                                                                 [2016-01-04 08:48:00]                       [49871]                     2.0      0.162129