代码之家 › 专栏 › 技术社区 › Kiv

按索引值将pandas系列拆分为连续的块

pandas-groupby numpy pandas python

Kiv · 技术社区 · 7 年前

我试图将一个序列分成几个部分,每个部分是连续的,并且有相同的索引。所以对于这个输入:

df = pd.Series([1,2,3,4,5,6,7], index=[1,1,1,2,2,1,1])

期望的结果是三大块,如:

[[1,2,3], [4,5], [6,7]]

我想用 groupby 但这将[1,2,3]和[6,7]组合成一个不需要的部分,因为它们不是连续的:

>>> groups = list(df.groupby(df.index, sort=False))
>>> len(groups)
2

这能在熊猫身上做吗?

2 回复 | 直到 7 年前

Zero 7 年前

你可以

In [761]: [v.tolist() for _, v in df.groupby(df.index.to_series().diff().ne(0).cumsum())]
Out[761]: [[1, 2, 3], [4, 5], [6, 7]]

细节

得到结果块。

In [762]: df.index.to_series().diff().ne(0).cumsum()
Out[762]:
1    1
1    1
1    1
2    2
2    2
1    3
1    3
dtype: int32

jpp 7 年前

您可以将序列转换为数据帧,然后使用 groupby 具有 shift + cumsum :

df = df.reset_index()

group_key = (df['index'] != df['index'].shift()).cumsum()
res = df.groupby(group_key)[0].apply(list).values.tolist()

print(res)

[[1, 2, 3], [4, 5], [6, 7]]

group_key 枚举值组:

print(group_key)

0    1
1    1
2    1
3    2
4    2
5    3
6    3
Name: index, dtype: int32

推荐文章

user29747013 · 如何创建一个新的数据框架,其中包含原始数据框架中列的聚合列?

1 年前

Cam · Pandas列表日期到日期时间

1 年前

jjkennedy · Pandas文本文件导入:当每个文件中存在多个表时,自动选择1个表

1 年前

Sun Jar · 在另一个系列中查找当前df值的索引,并将其添加到列中

1 年前

dietzi96 · Pandas DataFrame.to_sql随机和静默地失败,没有错误消息

1 年前

Bijan · Pandas批量更新帐户字符串

1 年前

Kernel · TypeError:Index.reindex()收到意外的关键字参数fill_value'

1 年前

Kernel · 进入熊猫的定义。系列super().reindex

1 年前

adventurous_chip_55 · 如何引爆柱子

1 年前

RKIDEV · Panda迭代行并将第n行值乘以下一(n+1)行值

1 年前