代码之家 › 专栏 › 技术社区 › sds Niraj Rajbhandari

熊猫:哪一个门槛适用于每一行?

pandas python

sds Niraj Rajbhandari · 技术社区 · 7 年前

给出一个分数列,例如,

scores = pd.DataFrame({"score":np.random.randn(10)})

和阈值

thresholds = pd.DataFrame({"threshold":[0.2,0.5,0.8]},index=[7,13,33])

我想找到每个分数的适用阈值,例如:

      score   threshold
 0 -1.613293   NaN
 1 -1.357980   NaN
 2  0.325720     7
 3  0.116000   NaN
 4  1.423171    33
 5  0.282557     7
 6 -1.195269   NaN
 7  0.395739     7
 8  1.072041    33
 9  0.197853   NaN

低,每分 s 我想要门槛 t 这样的话

t = min(t: thresholds.threshold[t] < s)

我该怎么做?

ps.根据删除的答案:

pd.cut(scores.score, bins=[-np.inf]+list(thresholds.threshold)+[np.inf],
       labels=["low"]+list(thresholds.index))

3 回复 | 直到 7 年前

user3483203 7 年前

pd.cut

scores['threshold'] = pd.cut(
                         scores.score,
                         bins=thresholds.threshold.values.tolist() + [np.nan],
                         labels=thresholds.index.values
                      )

      score threshold
0 -1.613293       NaN
1 -1.357980       NaN
2  0.325720       7.0
3  0.116000       NaN
4  1.423171      33.0
5  0.282557       7.0
6 -1.195269       NaN
7  0.395739       7.0
8  1.072041      33.0
9  0.197853       NaN

This answer cut apply digitize

scores = pd.DataFrame({"score":np.random.randn(10)})
scores = pd.concat([scores]*10000)

%timeit pd.cut(scores.score,thresholds.threshold.values.tolist() + [np.nan],labels=thresholds.index.values)
4.41 ms Â± 39.1 Âµs per loop (mean Â± std. dev. of 7 runs, 100 loops each)

indeces = [None,] + thresholds.index.tolist()

%timeit scores["score"].apply(lambda x: indeces[np.digitize(x, thresholds["threshold"])])
1.64 s Â± 18.1 ms per loop (mean Â± std. dev. of 7 runs, 1 loop each)

应用

koPytok 7 年前

np.digitize

indeces = [None,] + thresholds.index.tolist()
scores["score"].apply(
    lambda x: indeces[np.digitize(x, thresholds["threshold"])])

Ben.T 7 年前

merge_asof

(pd.merge_asof( scores.reset_index().sort_values('score'), 
                thresholds.reset_index(), 
                left_on='score', right_on= 'threshold', suffixes = ('','_'))
     .drop('threshold',1).rename(columns={'index_':'threshold'})
     .set_index('index').sort_index())

通过你的数据,你可以得到:

          score  threshold
index                     
0     -1.613293        NaN
1     -1.357980        NaN
2      0.325720        7.0
3      0.116000        NaN
4      1.423171       33.0
5      0.282557        7.0
6     -1.195269        NaN
7      0.395739        7.0
8      1.072041       33.0
9      0.197853        NaN

推荐文章

Google User · Django管理员在`list_display中未显示`creation_date`字段`

5 月前

user29747013 · 如何创建一个新的数据框架,其中包含原始数据框架中列的聚合列?

5 月前

ÎÎÎ½Î· ÎÎ®Î¹Î½Î¿Ï · Python lxml.html语法错误:使用lxml find时XPATH的谓词无效

5 月前

user29715306 · from_users=和chats=电视节目中的差异

5 月前

Redshoe · 当执行numpy.genfromtxt()时,python是否会读取文件的所有行?

5 月前

RASEL MAHMUD · 为什么以及如何在is_even()函数内的IF条件中递归X变量在满足0后递增?[副本]

5 月前

prayner · 更新嵌套字典包含列表中的项

5 月前

Bringo Jr · 我可以在O(n)中解决这个问题吗?

5 月前

Dave · 如何在for循环中修改列表值

5 月前

Shukurullox Komiljonov · 从记录中获得相互和解。使用SQL

5 月前