IIUC,假设该数据帧作为输入:
import pandas as pd
data = {
"Child": ["A032", "A001"],
"Parent": ["A001", "A043"],
"Ult_Parent": ["A039", "A039"],
"Full_Family": [
"A001, A032, A039, A040, A041, A043, A043, A045, A046",
"A001, A032, A039, A040, A041, A043, A043, A045, A046",
],
}
df = pd.DataFrame(data)
Child Parent Ult_Parent Full_Family
0 A032 A001 A039 A001, A032, A039, A040, A041, A043, A043, A045...
1 A001 A043 A039 A001, A032, A039, A040, A041, A043, A043, A045...
您可以使用这种方法:
df["Correct_Order"] = df.apply(
lambda row: ", ".join(sorted([row["Parent"], row["Child"], row["Ult_Parent"]])),
axis=1,
)
df["Correct_Ult_Parent_per_Family"] = (
df[["Parent", "Child", "Ult_Parent"]].max(axis=1).max()
)
Child Parent Ult_Parent Full_Family Correct_Order Correct_Ult_Parent_per_Family
0 A032 A001 A039 A001, A032, A039, A040, A041, A043, A043, A045... A001, A032, A039 A043
1 A001 A043 A039 A001, A032, A039, A040, A041, A043, A043, A045... A001, A039, A043 A043
如果
'Full_Family'
不一定是按升序排列的,并且您希望尊重其顺序,可以定义一个自定义键
sorted
.
例如,如果
A039
在之前
A032
在里面
'Full_Family'
在第一行中:
data = {
"Child": ["A032", "A001"],
"Parent": ["A001", "A043"],
"Ult_Parent": ["A039", "A039"],
"Full_Family": [
"A001, A039, A032, A040, A041, A043, A043, A045, A046",
"A001, A032, A039, A040, A041, A043, A043, A045, A046",
],
}
df = pd.DataFrame(data)
使用自定义密钥:
df["Correct_Order"] = df.apply(
lambda row: ", ".join(
sorted(
[row["Parent"], row["Child"], row["Ult_Parent"]],
key=lambda x: {
val: idx for idx, val in enumerate(row["Full_Family"].split(", "))
}[x],
)
),
axis=1,
)
df["Correct_Ult_Parent_per_Family"] = df["Correct_Order"].str.split().str[-1].max()
Child Parent Ult_Parent Full_Family Correct_Order Correct_Ult_Parent_per_Family
0 A032 A001 A039 A001, A039, A032, A040, A041, A043, A043, A045... A001, A039, A032 A043
1 A001 A043 A039 A001, A032, A039, A040, A041, A043, A043, A045... A001, A039, A043 A043