代码之家 › 专栏 › 技术社区 › Shayna

按col3分组的熊猫等效选择计数(不同col1,col2)

pandas-groupby group-by dataframe pandas python

Shayna · 技术社区 · 6 年前

生成数据帧:

people = ['shayna','shayna','shayna','shayna','john']
dates = ['01-01-18','01-01-18','01-01-18','01-02-18','01-02-18']
places = ['hospital', 'hospital', 'inpatient', 'hospital', 'hospital']
d = {'Person':people,'Service_Date':dates, 'Site_Where_Served':places}
df = pd.DataFrame(d)
df

Person   Service_Date   Site_Where_Served
shayna   01-01-18       hospital 
shayna   01-01-18       hospital 
shayna   01-01-18       inpatient 
shayna   01-02-18       hospital 
john     01-02-18       hospital

我想做的是按Site\u Where\u服务分组,计算唯一的成对人员及其服务\u日期。

预期输出:

Site_Where_Served    Site_Visit_Count
hospital             3
inpatient            1

我的尝试:

df[['Person', 'Service_Date']].groupby(df['Site_Where_Served']).nunique().reset_index(name='Site_Visit_Count')

但它不知道如何重置索引。因此,我尝试忽略这一点,我意识到这并不是计算唯一的一对“Person”和“Service\u Date”,因为输出如下所示:

                   Person    Service_Date
Site_Where_Served
hospital              2           2 
inpatient             1           1

4 回复 | 直到 6 年前

Community CDub 4 年前

`drop_duplicates` 具有 `groupby` + `count`

(df.drop_duplicates()
   .groupby('Site_Where_Served')
   .Site_Where_Served.count()
   .reset_index(name='Site_Visit_Count')
)

  Site_Where_Served  Site_Visit_Count
0          hospital                 3
1         inpatient                 1

注意,两者之间有一个微小的区别 计数 / size 前者不计算NaN个条目。

元组化, `子句` 和 `nunique`

这实际上只是修复您当前的解决方案,但我不建议这样做,因为这是一个冗长的过程,需要执行的步骤太多。首先,对列进行tuplize,分组依据 Site_Where_Served ,然后计数:

(df[['Person', 'Service_Date']]
   .apply(tuple, 1)
   .groupby(df.Site_Where_Served)
   .nunique()
   .reset_index(name='Site_Visit_Count')
)

  Site_Where_Served  Site_Visit_Count
0          hospital                 3
1         inpatient                 1

jpp 6 年前

在我看来,更好的方法是在使用之前删除副本 groupby.size :

res = df.drop_duplicates()\
        .groupby('Site_Where_Served').size()\
        .reset_index(name='Site_Visit_Count')

print(res)

  Site_Where_Served  Site_Visit_Count
0          hospital                 3
1         inpatient                 1

cs95 abhishek58g 6 年前

大概 value_counts

(df.drop_duplicates()
   .Site_Where_Served
   .value_counts()
   .to_frame('Site_Visit_Count')
   .rename_axis('Site_Where_Served')
   .reset_index()
)

  Site_Where_Served  Site_Visit_Count
0          hospital                 3
1         inpatient                 1

piRSquared 6 年前

`Counter` 1.

pd.Series(Counter(df.drop_duplicates().Site_Where_Served)) \
    .rename_axis('Site_Where_Served').reset_index(name='Site_Visit_Count')

  Site_Where_Served  Site_Visit_Count
0          hospital                 3
1         inpatient                 1

`柜台` 2.

pd.DataFrame(
    list(Counter(t[2] for t in set(map(tuple, df.values))).items()),
    columns=['Site_Where_Served', 'Site_Visit_Count']
)

  Site_Where_Served  Site_Visit_Count
0          hospital                 3
1         inpatient                 1

推荐文章

Mainland · Python数据帧规范化值错误:列的长度必须与键相同

1 年前

user026 · 如何根据特定窗口的平均值(行数)创建新列?

1 年前

rpn · 如何在列[1]中连续第二次出现“0”时返回列[0]的值

1 年前