代码之家 › 专栏 › 技术社区 › RustyShackleford

如何在数据帧之间进行多逻辑值比较?

dataframe pandas python-3.x

RustyShackleford · 技术社区 · 6 年前

我有两个这样的数据帧:

DF1:

Email      DateTimeCompleted
2@2.com    2019-02-09T01:34:44.591Z

DF2:

Email         DateTimeCompleted
b@b.com       2019-01-29T01:34:44.591Z
2@2.com       2018-01-29T01:34:44.591Z

我怎么查 Email DF2中的值并比较其中 DateTimeCompleted 是否大于今天(减去)90天并将DF1行数据追加到DF2中?有时添加df2可以是空的,如果这有区别的话。

更新的DF2如下:

 Email         DateTimeCompleted
b@b.com       2019-01-29T01:34:44.591Z
2@2.com       2018-01-29T01:34:44.591Z
2@2.com       2019-02-09T01:34:44.591Z

我试过这个:

from datetime import date    

if df1.Email in df2.Email & df2.DateTimeCompleted >= date.today()-90 :
    print('true')

我得到错误:

TypeError: 'Series' objects are mutable, thus they cannot be hashed

Also tried:

if df2.Email.str.contains(df1.Email.iat[0]):
    print('true')

got error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

2 回复 | 直到 6 年前

Erfan 6 年前

您可以执行以下操作:
1。 merge 键列上的两个数据帧 Email 所以您知道两个数据帧中都包含哪些行。
2。筛选大于的行 today - 90days
三。将数据帧与 pd.concat

代码:

# Merge dataframes together
df3 = pd.merge(df1, df2, on=['Email'], suffixes=['', '_2'])

# Filter the rows
df3 = df3[df3.DateTimeCompleted > (dt.today() - timedelta(90))]

# Drop the column we dont need
df3.drop(['DateTimeCompleted_2'], axis=1, inplace=True)

# Create final dataframe by concatting
df_final = pd.concat([df2, df3], ignore_index=True)

    Email   DateTimeCompleted
0   b@b.com 2019-01-29 01:34:44.591
1   2@2.com 2018-01-29 01:34:44.591
2   2@2.com 2019-02-09 01:34:44.591

Noordeen xiyurui 6 年前

我编写了一个函数来执行以下操作

函数接受参数

mailid, dataframe1, dataframe2

def process(mailid,df1,df2):
    if mailid in df2.Email.values:
        b = df1.loc[df1.Email==mailid,"DateTimeCompleted"].head(1)
        if((~b.empty) or (int(((pd.to_datetime('today'))-(pd.to_datetime(b))).astype('timedelta64[D]')) >90)):
            df1 = pd.concat([df1, pd.DataFrame([[mailid,b[0]]],columns=['Email','DateTimeCompleted'])],axis=0)
            print("Added the row")
        else:
            print("Condition failed")
            print("False")
    else:
        print("The mail is not there in dataframe")
    return df1

推荐文章