我有两个dfs:dfname(有不同版本的球员名称)和dfgoals,其中有关于球员和他们进球的信息。
我想根据一个条件为答案df中的每个玩家返回一行:
(i) 查看dfgoals中的actual_name列中是否存在name1值,如果存在,则返回第一个匹配的行,否则检查name2值并返回第一个相匹配的行
(ii)匹配的值(name1或name2)也从dfname返回name col值
dfname = pd.DataFrame({
"name": ["ryan", "bill", "saka", "Henry","Rooney"],
"name1": ["ryan 112", "Bill Matt Cdevaca", "Bukayo Saka", "Super Henry","Rooney"],
"name2": ["NaN", "XXVaca", "Bukayo", "Thierry","Rooney"]})
dfgoals = pd.DataFrame({
"actual_name": ["ryan 112", "XXVaca", "Bukayo", "Thierry", "Ronaldo", "Messi"],
"goals": [0, 2, 5, 10, 100, 200],
"matches": [22, 100, 200, 300, 100, 90]})
answerdf = pd.DataFrame({
"actual_name": ["ryan 112", "XXVaca", "Bukayo", "Thierry", "Rooney"],
"goals": [0, 2, 5, 10, "NaN"],
"matches": [22, 100, 200, 300, "NaN"],
"name_from_dfname": ["ryan", "bill", "saka", "Henry", "Rooney"]})
answerdf
Rooney's values are NaN because his goals record is not available
到目前为止,我已经尝试过了,但它没有正确检查name1-2的值,例如,它只给我瑞恩的进球,而没有给其他球员的进球,因为他们的名字被不同地提及
df = dfgoals
values_to_check = ['ryan', 'Bill Matt Cdevaca', 'saka', 'henry', 'Rooney']
filtered_rows = []
# Iterate through the DataFrame rows to find matches and concatenate values
for index, row in dfgoals.iterrows():
matched_values = [value for value in values_to_check if value.lower() in row['actual_name'].lower()]
if matched_values:
row['concatenated_values'] = '|'.join(matched_values)
filtered_rows.append(row)
# Create a new DataFrame from the filtered rows
result_df = pd.DataFrame(filtered_rows)
result_df['concatenated_values'] = pd.Categorical(result_df['concatenated_values'], categories=values_to_check, ordered=True)
# Sort the DataFrame based on the 'concatenated_values' column
result_df.sort_values(by = "concatenated_values")