代码之家  ›  专栏  ›  技术社区  ›  Sid

大熊猫:检查特定的一天是否以固定的时间间隔在索引中,如果没有,在输入前标记为某物?

  •  1
  • Sid  · 技术社区  · 6 年前

    我对大熊猫还是很陌生,只是发现我在之前跟踪的过程中犯了一个错误。

     df_date
             Date        day
    0  2016-05-26   Thursday
    1  2016-05-27     Friday
    2  2016-05-30     Monday
    3  2016-05-31    Tuesday
    4  2016-06-01  Wednesday
    5  2016-06-02   Thursday
    6  2016-06-03     Friday
    7  2016-06-06     Monday
    8  2016-06-07    Tuesday
    9  2016-06-08  Wednesday
    10 2016-06-09   Thursday
    11 2016-06-10     Friday
    12 2016-06-13     Monday
    13 2016-06-14    Tuesday
    14 2016-06-15  Wednesday
    15 2016-06-16   Thursday
    16 2016-06-17     Friday
    17 2016-06-20     Monday
    18 2016-06-21    Tuesday
    19 2016-06-22  Wednesday
    20 2016-06-24     Friday
    21 2016-06-27     Monday
    22 2016-06-28    Tuesday
    23 2016-06-29  Wednesday
    

    大约有600多行。

    我想做什么

    制作专栏 'Exit' 如果在哪里 thursday 不在一周内,星期三变为E,如果星期三不在,那么星期二。

    我尝试了一个for循环,但我似乎不能把这个正确。

    预期输出:

     df_date
             Date        day  Exit
    0  2016-05-26   Thursday  E
    1  2016-05-27     Friday  
    2  2016-05-30     Monday
    3  2016-05-31    Tuesday
    4  2016-06-01  Wednesday
    5  2016-06-02   Thursday  E
    6  2016-06-03     Friday
    7  2016-06-06     Monday
    8  2016-06-07    Tuesday
    9  2016-06-08  Wednesday
    10 2016-06-09   Thursday  E
    11 2016-06-10     Friday
    12 2016-06-13     Monday
    13 2016-06-14    Tuesday
    14 2016-06-15  Wednesday
    15 2016-06-16   Thursday  E
    16 2016-06-17     Friday
    17 2016-06-20     Monday
    18 2016-06-21    Tuesday
    19 2016-06-22  Wednesday  E
    20 2016-06-24     Friday
    21 2016-06-27     Monday
    22 2016-06-28    Tuesday
    23 2016-06-29  Wednesday  E
    

    我在评论中添加了这个,但也应该在这里:

    如果星期四不存在,那么就在它之前记录。

    所以如果星期三也不在,那么星期二

    如果星期二也不是星期一,如果星期一不是星期五。星期六和星期天永远不会有记录。

    2 回复  |  直到 6 年前
        1
  •  1
  •   yatu Sayali Sonawane    6 年前

    以下是解决方案:

    ix = df.groupby(pd.Grouper(key='Date', freq='W')).Date
           .apply(lambda x: (x.dt.dayofweek <= 3)[::-1].idxmax()).values
    df.loc[ix,'Exit'] = 'E'
    df.fillna('')
    
          Date        day     Exit
    0  2016-05-26   Thursday    E
    1  2016-05-27     Friday     
    2  2016-05-30     Monday     
    3  2016-05-31    Tuesday     
    4  2016-06-01  Wednesday     
    5  2016-06-02   Thursday    E
    6  2016-06-03     Friday     
    7  2016-06-06     Monday     
    8  2016-06-07    Tuesday     
    9  2016-06-08  Wednesday     
    10 2016-06-09   Thursday    E
    11 2016-06-10     Friday     
    12 2016-06-13     Monday     
    13 2016-06-14    Tuesday     
    14 2016-06-15  Wednesday     
    15 2016-06-16   Thursday    E
    16 2016-06-17     Friday     
    17 2016-06-20     Monday     
    18 2016-06-21    Tuesday     
    19 2016-06-22  Wednesday     
    20 2016-06-23   Thursday    E
    21 2016-06-24     Friday     
    22 2016-06-27     Monday     
    23 2016-06-28    Tuesday     
    24 2016-06-29  Wednesday    E
    
        2
  •  1
  •   jpp    6 年前

    你可以使用 dt.week dt.weekday 您的属性 datetime 系列。然后使用 groupby + max 为了你需要的逻辑。这可能比顺序的平等检查更有效。

    df['Date'] = pd.to_datetime(df['Date'])
    
    # add week and weekday series
    df['Week'] = df['Date'].dt.week
    df['Weekday'] = df['Date'].dt.weekday.where(df['Date'].dt.weekday.isin([1, 2, 3]))
    
    df['Exit'] = np.where(df['Weekday'] == df.groupby('Week')['Weekday'].transform('max'),
                          'E', '')
    

    结果

    我已经离开了helper列,所以解决方案的工作方式是明确的。这些可以很容易地去除。

    print(df)
    
             Date        day  Week  Weekday Exit
    0  2016-05-26   Thursday    21      3.0    E
    1  2016-05-27     Friday    21      NaN     
    2  2016-05-30     Monday    22      NaN     
    3  2016-05-31    Tuesday    22      1.0     
    4  2016-06-01  Wednesday    22      2.0     
    5  2016-06-02   Thursday    22      3.0    E
    6  2016-06-03     Friday    22      NaN     
    7  2016-06-06     Monday    23      NaN     
    8  2016-06-07    Tuesday    23      1.0     
    9  2016-06-08  Wednesday    23      2.0     
    10 2016-06-09   Thursday    23      3.0    E
    11 2016-06-10     Friday    23      NaN     
    12 2016-06-13     Monday    24      NaN     
    13 2016-06-14    Tuesday    24      1.0     
    14 2016-06-15  Wednesday    24      2.0     
    15 2016-06-16   Thursday    24      3.0    E
    16 2016-06-17     Friday    24      NaN     
    17 2016-06-20     Monday    25      NaN     
    18 2016-06-21    Tuesday    25      1.0     
    19 2016-06-22  Wednesday    25      2.0    E
    20 2016-06-24     Friday    25      NaN     
    21 2016-06-27     Monday    26      NaN     
    22 2016-06-28    Tuesday    26      1.0     
    23 2016-06-29  Wednesday    26      2.0    E