代码之家 › 专栏 › 技术社区 › The Great

使用列表项链接列

pandas-groupby numpy dataframe pandas python

The Great · 技术社区 · 3 年前

我有两个数据帧 df 和 tf 如下所示

df = [{"unique_key": 1, "test_ids": "1.0,15,2.0,nan"}, {"unique_key": 2, "test_ids": "51,75.0,11.0,NaN"},{"unique_key": 3, "test_ids":np.nan},
     {"unique_key": 4, "test_ids":np.nan}]
df = pd.DataFrame(df)

test_ids,status,revenue,cnt_days     
1,passed,234.54,3          
2,passed,543.21,5
11,failed,21.3,4
15,failed,2098.21,6             
51,passed,232,21     
75,failed,123.87,32 

tf = pd.read_clipboard(sep=',')

我想 链接 unique_key 来自 df 到 tf 数据帧

例如:我将在下面显示我的输出(这比文本容易理解)

我在尝试下面的东西

for b in df.test_ids.tolist():
    for a in b.split(','):
        if a >= 0: # to exclude NA values from checking
            for i in len(test_ids):
              if int(a)  == tf['test_ids'][i]:
                   tf['unique_key'] = df['unique_key']

但这对解决我的问题既不高效也不优雅。

有没有其他更好的方法来实现下面所示的预期产出?

1 回复 | 直到 3 年前

jezrael 3 年前

你可以创造 Series 使用“删除重复项和缺少的值”,切换到DictionAnry并用于新的第一列使用 DataFrame.insert 具有 Series.map :

s = (df.set_index('unique_key')['test_ids']
       .str.split(',')
       .explode()
       .astype(float)
       .dropna()
       .astype(int)
       .drop_duplicates()
d = {v: k for k, v in s.items()}
print (d)
{1: 1, 15: 1, 2: 1, 51: 2, 75: 2, 11: 2}

tf.insert(0, 'unique_key', tf['test_ids'].map(d))
print (tf)
   unique_key  test_ids  status  revenue  cnt_days
0           1         1  passed   234.54         3
1           1         2  passed   543.21         5
2           2        11  failed    21.30         4
3           1        15  failed  2098.21         6
4           2        51  passed   232.00        21
5           2        75  failed   123.87        32

另一个想法是与 DataFrame 创造 系列 对于映射:

s = (df.assign(new = df['test_ids'].str.split(','))
       .explode('new')
       .astype({'new':float})
       .dropna(subset=['new'])
       .astype({'new':int})
       .drop_duplicates(subset=['new'])
       .set_index('new')['unique_key'])

print (s)
new
1     1
15    1
2     1
51    2
75    2
11    2
Name: unique_key, dtype: int64

tf.insert(0, 'unique_key', tf['test_ids'].map(s))
print (tf)
   unique_key  test_ids  status  revenue  cnt_days
0           1         1  passed   234.54         3
1           1         2  passed   543.21         5
2           2        11  failed    21.30         4
3           1        15  failed  2098.21         6
4           2        51  passed   232.00        21
5           2        75  failed   123.87        32

推荐文章

Google User · Django管理员在`list_display中未显示`creation_date`字段`

3 月前

user29747013 · 如何创建一个新的数据框架,其中包含原始数据框架中列的聚合列?

3 月前

ÎÎÎ½Î· ÎÎ®Î¹Î½Î¿Ï · Python lxml.html语法错误:使用lxml find时XPATH的谓词无效

3 月前

user29715306 · from_users=和chats=电视节目中的差异

3 月前

Redshoe · 当执行numpy.genfromtxt()时,python是否会读取文件的所有行?

3 月前

RASEL MAHMUD · 为什么以及如何在is_even()函数内的IF条件中递归X变量在满足0后递增?[副本]

3 月前

prayner · 更新嵌套字典包含列表中的项

3 月前

Bringo Jr · 我可以在O(n)中解决这个问题吗?

4 月前

Dave · 如何在for循环中修改列表值

4 月前

Shukurullox Komiljonov · 从记录中获得相互和解。使用SQL

4 月前