代码之家 › 专栏 › 技术社区 › Engineero

从熊猫数据帧生成三元组

machine-learning pandas python

1

Engineero · 技术社区 · 7 年前

我正在尝试从基于类或标签的熊猫数据帧生成所有三组数据。假设我有一个数据帧,每行有一个唯一的标识符,每行有一个类/标签。我想要三元组,其中前两个元素来自同一个类/标签,最后一个元素来自不同的类/标签。我正在努力 全部的 这样的三胞胎。

我可以用 相同的 标签很好,但是当我尝试用 不同的 标签,我得到一个数组 None .

我的示例数据框:

import pandas as pd
import numpy as np

df = pd.DataFrame({'uuid': np.arange(5),
                   'label': [0, 1, 1, 0, 0]})
print(df)

   label  uuid
0      0     0
1      1     1
2      1     2
3      0     3
4      0     4

注意 uuid 列只是一个占位符。关键是它对于每一行都是唯一的。下面生成了 相同的 元素并将其插入列表:

import itertools as it

labels = df.label.unique()
all_combos = []
for l in labels:
    combos = list(it.combinations(df.loc[df.label == l].as_matrix(), 2))
    all_combos.extend([list(c) for c in combos])  # convert to list because I anticipate needing to add to each combo later
all_combos

[[array([0, 0]), array([0, 3])],
 [array([0, 0]), array([0, 4])],
 [array([0, 3]), array([0, 4])],
 [array([1, 1]), array([1, 2])]]

现在我想要所有这些组合 不同的 元素已附加。

我尝试:

for l in labels:
    combos = list(it.combinations(df.loc[df.label == l].as_matrix(), 2))
    combo_list = [list(c) for c in combos]
    for c in combo_list:
        new_combos = [list(c).extend(s) for s in df.loc[df.label != l].as_matrix()]
        all_combos.append(new_combos)

我期望:

all_combos

[[array([0, 0]), array([0, 3]), array([1, 1])],
 [array([0, 0]), array([0, 3]), array([1, 2])],
 [array([0, 0]), array([0, 4]), array([1, 1])],
 [array([0, 0]), array([0, 4]), array([1, 2])],
 [array([0, 3]), array([0, 4]), array([1, 1])],
 [array([0, 3]), array([0, 4]), array([1, 2])],
 [array([1, 1]), array([1, 2]), array([0, 0])],
 [array([1, 1]), array([1, 2]), array([0, 3])],
 [array([1, 1]), array([1, 2]), array([0, 4])]]

我得到:

all_combos

[[None, None], [None, None], [None, None], [None, None, None]]

这似乎很奇怪:它们的长度都不一样!但是我有同样数量的 没有 在我的结果中是有效三元组的预期数目。

我也试过了 all_combos.extend(new_combos) 得到了一个包含9个元素的一维列表,所以只是上面结果的扁平版本。事实上,任何 list.extend 和 list.append 在内部循环的最后两行中,要么给出上面显示的结果,要么给出结果的扁平版本,这两个结果对我都没有意义。

编辑: 如评论所述, 列表.扩展 和 list.append附加 已经到位,所以他们不会返回任何内容。那么,如何让我的列表理解为这些值呢?或者重构为其他有效的东西?

1 回复 | 直到 7 年前

1

0

Engineero 7 年前

list.append list.extend None

np.concatenate

for l in labels:
    combos = list(it.combinations(df.loc[df.label == l].as_matrix(), 2))
    for c in combos:
        new_combos = [np.concatenate((c, (s,)), axis=0) for s in df.loc[df.label != l].as_matrix()]
        all_combos.extend(new_combos)

np.append(c, (s,), axis=0)