一个没有正则表达式的选项,利用日期识别
pd.to_datetime
:
df = pd.DataFrame({'letters': text.splitlines()})
m = pd.to_datetime(df['letters'], errors='coerce').notna()
out = df.assign(date=df['letters'].where(m).ffill()
).loc[~m, ::-1].reset_index(drop=True)
可选语法:
s = pd.Series(text.splitlines())
m = pd.to_datetime(s, errors='coerce').notna()
df = pd.DataFrame({'date': s.where(m).ffill(), 'letters': s}
)[~m].reset_index(drop=True)
输出:
date letters
0 10 February 2023 abc
1 10 February 2023 def
2 23 March 2023 ghi
3 23 March 2023 jkl