以我的经验,整数键会导致更快的合并时间。为此,您可以映射
Val1
Val2
两个数据帧上的整数(
df1
df2
)然后合并
瓦尔1
和
.
我相信有更有效的方法来绘制地图
瓦尔1
和
Val2型
但这个答案的目的是为了说明在整数上合并更快。
df2[['Val1','Val2']] = df2[['Val1','Val2']].apply(lambda x: pd.Categorical(x))
d1 = dict(enumerate(df2['Val1'].cat.categories))
d2 = dict(enumerate(df2['Val2'].cat.categories))
d1 = {v:k for k,v in d1.items()}
d2 = {v:k for k,v in d2.items()}
df1['Val1'].replace(d1, inplace=True)
df2['Val1'].replace(d1, inplace=True)
df1['Val2'].replace(d2, inplace=True)
df2['Val2'].replace(d2, inplace=True)
df2.merge(df1, on=['Val1','Val2'], how='left')
# Merge with original keys
start = time.time()
df2.merge(df1, on=['Val1','Val2'], how='left')
round(time.time() - start, 5)
0.00672
# Merge with integer keys
start = time.time()
df2.merge(df1, on=['Val1','Val2'])
round(time.time() - start, 5)
0.00485