代码之家  ›  专栏  ›  技术社区  ›  Bill

如何将24小时值的日期/时间字符串转换为熊猫中的日期时间?

  •  2
  • Bill  · 技术社区  · 6 年前

    我正在从一个普通的邮件应用程序(Mac OS X)将电子邮件作为文本文件导入。不幸的是,电子邮件中的许多日期都有 "24:01:01" 这不是一个有效的时间(应该是 "00:01:01" )

    有没有一个简单的方法来转换这些?

    正常日期/时间字符串工作正常:

    >>> pd.to_datetime("March 23, 2011 at 23:42:46  PDT")
    Timestamp('2011-03-23 23:42:46-0700', tz='pytz.FixedOffset(-420)')
    

    异常日期字符串:

    >>> pd.to_datetime("March 23, 2011 at 24:42:46  PDT")
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    ~/anaconda/envs/pyqt/lib/python3.6/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
       1860         try:
    -> 1861             values, tz_parsed = conversion.datetime_to_datetime64(data)
       1862             # If tzaware, these values represent unix timestamps, so we
    
    pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.datetime_to_datetime64()
    
    TypeError: Unrecognized value type: <class 'str'>
    
    During handling of the above exception, another exception occurred:
    
    ValueError                                Traceback (most recent call last)
    <ipython-input-38-4cb009b21802> in <module>
    ----> 1 pd.to_datetime("March 23, 2011 at 24:42:46  PDT")
    
    ~/anaconda/envs/pyqt/lib/python3.6/site-packages/pandas/core/tools/datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, box, format, exact, unit, infer_datetime_format, origin, cache)
        609             result = convert_listlike(arg, box, format)
        610     else:
    --> 611         result = convert_listlike(np.array([arg]), box, format)[0]
        612 
        613     return result
    
    ~/anaconda/envs/pyqt/lib/python3.6/site-packages/pandas/core/tools/datetimes.py in _convert_listlike_datetimes(arg, box, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
        300             arg, dayfirst=dayfirst, yearfirst=yearfirst,
        301             utc=utc, errors=errors, require_iso8601=require_iso8601,
    --> 302             allow_object=True)
        303 
        304     if tz_parsed is not None:
    
    ~/anaconda/envs/pyqt/lib/python3.6/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
       1864             return values.view('i8'), tz_parsed
       1865         except (ValueError, TypeError):
    -> 1866             raise e
       1867 
       1868     if tz_parsed is not None:
    
    ~/anaconda/envs/pyqt/lib/python3.6/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
       1855             dayfirst=dayfirst,
       1856             yearfirst=yearfirst,
    -> 1857             require_iso8601=require_iso8601
       1858         )
       1859     except ValueError as e:
    
    pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
    
    pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
    
    pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime_object()
    
    pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime_object()
    
    pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string()
    
    ~/anaconda/envs/pyqt/lib/python3.6/site-packages/dateutil/parser/_parser.py in parse(timestr, parserinfo, **kwargs)
       1354         return parser(parserinfo).parse(timestr, **kwargs)
       1355     else:
    -> 1356         return DEFAULTPARSER.parse(timestr, **kwargs)
       1357 
       1358 
    
    ~/anaconda/envs/pyqt/lib/python3.6/site-packages/dateutil/parser/_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
        651             raise ValueError("String does not contain a date:", timestr)
        652 
    --> 653         ret = self._build_naive(res, default)
        654 
        655         if not ignoretz:
    
    ~/anaconda/envs/pyqt/lib/python3.6/site-packages/dateutil/parser/_parser.py in _build_naive(self, res, default)
       1225                 repl['day'] = monthrange(cyear, cmonth)[1]
       1226 
    -> 1227         naive = default.replace(**repl)
       1228 
       1229         if res.weekday is not None and not res.day:
    
    ValueError: hour must be in 0..23
    
    1 回复  |  直到 6 年前
        1
  •  2
  •   jezrael    6 年前

    首先转换好的约会时间 to_datetime 具有 errors='coerce' -得到 NaT 对于错误的值。所以过滤它, replace 24 再加上一天。最后 fillna 要用它替换缺少的值:

    d = ["March 23, 2011 at 24:42:46  PDT",
         "March 23, 2011 at 23:42:46  PDT"]
    
    s = pd.Series(d)
    
    s1 = pd.to_datetime(s, errors='coerce')
    m = s1.isna()
    
    s2 = (pd.to_datetime(s[m].replace('at 24:', 'at 00:', regex=True),  errors='coerce') +
             pd.Timedelta(1, unit='d'))
    
    s = s1.fillna(s2)
    print (s)
    0   2011-03-24 00:42:46
    1   2011-03-23 23:42:46
    dtype: datetime64[ns]
    

    另一个想法-提取日期和时间来分隔列并添加 timedelta S:

    s1 = pd.to_datetime(s, errors='coerce')
    m = s1.isna()
    
    df2 = s[m].str.split(' at ', expand=True)
    df2.columns = ['date','time']
    df2['date'] = pd.to_datetime(df2['date'], errors='coerce')
    df2['time'] = pd.to_timedelta(df2['time'].str.extract('(\d+:\d+:\d+)', expand=False))
    df2['date1'] = df2['date'] + df2['time']
    print (df2)
            date            time               date1
    0 2011-03-23 1 days 00:42:46 2011-03-24 00:42:46
    
    s = s1.fillna(df2['date1'])
    print (s)
    0   2011-03-24 00:42:46
    1   2011-03-23 23:42:46
    dtype: datetime64[ns]