你可以用熊猫的
cut
(
doc
)还有努比
linspace
(或任何等价物),应该可以生成间隔箱(n箱)和相应的箱标签(n-1)。你可以用$sign来丰富那些垃圾箱的标签。
另见
Label histogram by bins matplotlib
使用Seaborn的改编示例:
# coding=utf-8
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data = np.random.normal(5000, 1000, 1000)
dataframe = pd.DataFrame(data)
dataframe = dataframe.rename({0 : 'values'}, axis='columns')
print(dataframe.head(3))
print('---------------')
cut_vals = np.linspace(start=dataframe['values'].min(),stop=dataframe['values'].max(),num=8)
bin_headers = list()
for index in range(len(cut_vals) - 1):
header = '$ {}-{}'.format(int(cut_vals[index]), int(cut_vals[index+1]))
bin_headers.append(header)
print(bin_headers)
print('---------------')
dataframe['categories'] = pd.cut(dataframe['values'], cut_vals, labels=bin_headers)
print('---------------')
print(dataframe.head(2))
ind = np.array(cut_vals[:-1])
width = np.array([cut_vals[i+1]-cut_vals[i] for i in range(len(cut_vals)-1)])
g = sns.distplot(dataframe['values'], bins=cut_vals, label='foo')
g.set_xticks(ind + width/2)
g.set_xticklabels(bin_headers, rotation=45)
plt.show()
输出:
values
0 4442.338053
1 5253.443608
2 6552.700087
---------------
['$ 2080-3034', '$ 3034-3988', '$ 3988-4942', '$ 4942-5897', '$ 5897-6851', '$ 6851-7805', '$ 7805-8759']
---------------
---------------
values categories
0 4442.338053 $ 3988-4942
1 5253.443608 $ 4942-5897