代码之家 › 专栏 › 技术社区 › oppressionslayer

如何在熊猫或裸鼠身上切片?

slice numpy pandas arrays python

oppressionslayer · 技术社区 · 4 年前

我有以下数据框,可以复制/粘贴到数据框中:df=pd。阅读剪贴板()

    0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15
0    0   1   5  12  13   7   1   5   5   1   7  13  12   5   1   0
1    1   0   4  13  12   6   0   4   4   0   6  12  13   4   0   1
2    5   4   0   9   8   2   4   0   0   4   2   8   9   0   4   5
3   12  13   9   0   1  11  13   9   9  13  11   1   0   9  13  12
4   13  12   8   1   0  10  12   8   8  12  10   0   1   8  12  13
5    7   6   2  11  10   0   6   2   2   6   0  10  11   2   6   7
6    1   0   4  13  12   6   0   4   4   0   6  12  13   4   0   1
7    5   4   0   9   8   2   4   0   0   4   2   8   9   0   4   5
8    5   4   0   9   8   2   4   0   0   4   2   8   9   0   4   5
9    1   0   4  13  12   6   0   4   4   0   6  12  13   4   0   1
10   7   6   2  11  10   0   6   2   2   6   0  10  11   2   6   7
11  13  12   8   1   0  10  12   8   8  12  10   0   1   8  12  13
12  12  13   9   0   1  11  13   9   9  13  11   1   0   9  13  12
13   5   4   0   9   8   2   4   0   0   4   2   8   9   0   4   5
14   1   0   4  13  12   6   0   4   4   0   6  12  13   4   0   1
15   0   1   5  12  13   7   1   5   5   1   7  13  12   5   1   0

我想从中取一个横截面,我想说:

[1, 4, 9, 1, 10, 6, 4, 0, 4, 6, 10, 1, 9, 4, 1])

这是索引df。位置[1,0],df。loc[2,1],df。loc[3,2],df。loc[4,3]等。

有没有一个numpy或pandas模式可以让这种类型的交叉切片比我正在做的许多不同的索引更容易?谢谢

4 回复 | 直到 4 年前

Shubham Sharma mkln 4 年前

我们可以使用 np.diagonal 具有 offset=1 选择主对角线上方的对角线元素

np.diagonal(df, offset=1)

array([ 1,  4,  9,  1, 10,  6,  4,  0,  4,  6, 10,  1,  9,  4,  1])

Mad Physicist 4 年前

如果你有一个numpy数组,你实际上可以得到一个切片。切片和高级索引表达式之间的区别在于,切片将视图返回到原始数据中,而高级索引总是生成副本。如果数组是C连续的,则可以使用 ravel 要获得视图,请执行以下操作:

arr = df.to_numpy()

row = 1
col = 0
n = 4
view = arr.ravel()[row * arr.shape[1] + col:(row + n - 1) * arr.shape[1] + col + n:arr.shape[1] + 1]

如果没有连续数组,则需要做更多的工作,因为需要手动设置视图的步长。你可以用 np.lib.stride_tricks.as_strided :

view = np.lib.stride_tricks.as_strided(arr[row:, col:], shape=n, strides=arr.strides[0] + arr.strides[1])

这应该与本文中介绍的更简单的方法相同 accepted answer .

å´æé 4 年前

我用 numpy 来解决这个问题。

import numpy as np
import pandas as pd

df = pd.DataFrame(np.arange(16 * 16).reshape(16, 16))
print(df)

print(df.to_numpy()[range(15), range(1, 16)])

enke 4 年前

你可以用numpy advanced indexing :

例如,如果要选择

df.loc[1, 0], df.loc[2, 1], df.loc[3, 2], df.loc[4, 3]

首先转换 df 要使用numpy数组,请使用适当的行和列索引对元素进行索引:

df_to_arr = df.to_numpy()
out = df_to_arr[[1,2,3,4], [0,1,2,3]]

输出:

array([1, 4, 9, 1], dtype=int64)