value_counts
从
Categorical
列(特别是包含月份信息)使用
pandas
import calendar
import random
import pandas as pd
random.seed(1)
month_names = calendar.month_name[1:]
month_names += month_names
df1 = pd.DataFrame({
'Month': month_names,
'Flag': [random.choice([True, False]) for _ in month_names]
})
df1['Month'] = pd.Categorical(
df1['Month'], categories=calendar.month_name[1:], ordered=True
)
print(df1.groupby('Month')['Flag'].value_counts())
按预期打印:
Month Flag
January False 2
February True 2
March False 2
April True 2
May True 2
June False 2
July False 1
True 1
August False 1
True 1
September False 2
October True 2
November False 1
True 1
December False 2
Name: Flag, dtype: int64
但是如果我们的
'Month'
列不包含所有可能的类别,
熊猫
抛出
ValueError
month_names = ['January', 'February', 'March']
month_names += month_names
df2 = pd.DataFrame({
'Month': month_names,
'Flag': [random.choice([True, False]) for _ in month_names]
})
df2['Month'] = pd.Categorical(
df2['Month'], categories=calendar.month_name[1:], ordered=True
)
print(df2.groupby('Month')['Flag'].value_counts())
加薪:
ValueError: operands could not be broadcast together with shape (12,) (3,)
我们有什么办法能得到合适的答案吗
值\u计数
部分数据的结果?理想情况下,这将保留完整的类别,但即使没有将是一个开始。