代码之家 › 专栏 › 技术社区 › DrWhat

在Pandas中,基于同一数据帧中的值对匹配多个条件的行进行计数,并在列中添加计数

dataframe pandas python

DrWhat · 技术社区 · 4 年前

import pandas as pd
inp = [{'index' :1, 'refIndex':3, 'org' : 'org1'}, {'index':2, 'refIndex':1, 'org': 'org1'}, {'index':3, 'refIndex': 2, 'org' : 'org2'}]
df = pd.DataFrame(inp)
print df

输出:

   index   refIndex   org
0  1       3          org1
1  2       1          org1
2  3       2          org2

我需要做的是计算每一行中有多少其他行的索引作为同一组织的refIndex出现。

   index   refIndex   org    count
0  1       3          org1   1        # index 1 org1 occurs as refIndex and org once elsewhere
1  2       1          org1   0        # index 2 org1 occurs as refIndex and org nowhere else
2  3       2          org2   0        # index 3 org2 occurs as refIndex and org nowhere else

我是蟒蛇和熊猫的新手,所以请原谅,如果这对你来说是显而易见的。我整天都在努力尝试groupby,函数,for循环,for循环,合并。

2 回复 | 直到 4 年前

Shubham Sharma mkln 4 年前

Numpy广播

i, r, o = df.values.T
df['count'] = np.sum((i[:, None] == r) & (o[:, None] == o), axis=1)

解释

通过比较中的每个值来创建布尔掩码 index 列中的每个值 refIndex 列。

>>> (i[:, None] == r)

array([[False,  True, False, False],
       [False, False,  True, False],
       [ True, False, False,  True],
       [False, False, False, False]])

通过比较中的每个值来创建另一个布尔掩码 org 列本身。

>>> (o[:, None] == o)

array([[ True,  True, False, False],
       [ True,  True, False, False],
       [False, False,  True,  True],
       [False, False,  True,  True]])

logical and True 不属于同一组织的第一个布尔掩码中的值。

>>> (i[:, None] == r) & (o[:, None] == o)

array([[False,  True, False, False],
       [False, False, False, False],
       [False, False, False,  True],
       [False, False, False, False]])

最后采取 sum 沿着 axis=1 计算行所在的行数作为 再索引 组织 .

>>> df

   index  refIndex   org  count
0      1         3  org1      1
1      2         1  org1      0
2      3         2  org2      0

Vishnudev Krishnadas 4 年前

df = df.rename(columns={'index': 'id'})
groups = df.groupby('org')
df.apply(lambda x: x.refIndex not in groups.get_group(x.org)['id'], axis=1).astype(int)

推荐文章

user1245262 · 筛选Pandas数据帧时出现问题

1 年前

Foroand · 熊猫数据帧中的词频计数耗时过长

1 年前

user14696236 · 如何为每个对应的列创建一行[重复]

2 年前

Shawn Hemelstrand · 为什么我的自定义errorbar函数不能在R中工作?

2 年前

Karim Abou El Naga · 将带字符串的DataFrame绘制到堆叠条形图中

2 年前

The Great · 拆分并存储数据帧,但名称基于特定列中的唯一值

2 年前

nickolakis · 基于R中的列名复制列

2 年前

opposity · 形成一个数据帧,该数据帧包含R中包含类别和子类别的列

2 年前

A. Handler · 有没有办法将数据帧的列与完整列名向量相匹配?

2 年前

JasonX · 运行减法计算

2 年前