代码之家  ›  专栏  ›  技术社区  ›  iforcebd

将数据帧内容创建到字典,其中键作为主题(从索引中获取),值作为行

  •  0
  • iforcebd  · 技术社区  · 4 年前

    包含以下内容的数据帧

                0                1           2          3            4          5        
    0    reviewers#0        -0.016271    0.011541    0.011903    -0.001355   0.008702
    1    reviewers#1         -0.037961   0.033415    0.020643    0.004748    0.014523
    2    reviewers#2         -0.019863   0.019906    0.017248    0.006216    0.008826
    3    reviewers#3         -0.021029   0.016401    0.010772    0.001874    0.005772
    4    reviewers#4         -0.013409   0.011703    0.011249    0.000111    0.009319
    5    reviewers#5         -0.008549   0.007816    0.007859    0.000984    0.005491
    6    reviewers#6         -0.01634    0.017007    0.014637    0.005241    0.008135
    7    reviewers#7         -0.017075   0.016119    0.013666    0.00314     0.008786
    8    reviewers#8         -0.030823   0.020217    0.012402    -0.003165   0.009643
    9    reviewers#9         -0.038311   0.026252    0.017619    0.003568    0.003972
    10   confiscated#0       -0.007147   0.007387    0.010867    0.000735    0.011244
    11   confiscated#1       -0.016917   0.014412    0.016182    0.001859    0.015596
    12   confiscated#2       -0.004854   0.004091    0.005075    -0.000566   0.00458
    13   confiscated#3       -0.02642    0.021311    0.018871    -0.001843   0.017033
    14   confiscated#4      -0.016161    0.013325    0.013113    -0.001036   0.011385
    15   confiscated#5       -0.0131     0.0117      0.013829    -0.000861   0.01225
    16   confiscated#6       -0.006454   0.005335    0.006634    -0.001038   0.006322
    17   confiscated#7      -0.006855    0.005225    0.007626    -0.003071   0.009048
    18   confiscated#8      -0.019227    0.015683    0.016805    -0.004709   0.019453
    19   confiscated#9      -0.010685    0.011237    0.011653    0.003006    0.007464
    

    现在我们想要一本字典如下

    dictionary = {
                0:{[ reviewers#0 -0.016271    0.011541    0.011903    -0.001355   0.008702], [confiscated#0 -0.007147  0.007387 0.010867 0.000735 0.011244]}
                1:{[ reviewers#1 -0.016271 0.011541 0.011903 -0.001355   0.008702], [confiscated#1 -0.007147  0.007387 0.010867 0.000735 0.011244]},
                .
                .
                .
                9: {[ reviewers#9 -0.016271 0.011541 0.011903 -0.001355 0.008702], [confiscated#9 -0.007147 0.007387 0.010867 0.000735 0.011244]}
     }
    

    提示:在数据框内容中,每行代表每个单词的主题嵌入(我们有10个主题#0到#9),例如,reviewers#0到reviewers#9我们可以通过应用flowing idea获得数据框中每行的主题(0到9):

    for inx in dataframe.index:
     topic = inx % 10
    
    

    非常感谢你的想法和帮助

    0 回复  |  直到 4 年前
        1
  •  0
  •   Grismar    4 年前

    听起来这就是你想要的:

    import pandas as pd
    from collections import defaultdict
    
    df = pd.DataFrame([
        {0: 'reviewers#0', 1: 1.0, 2: 2.0, 3: 3.0},
        {0: 'reviewers#1', 1: 4.0, 2: 5.0, 3: 6.0},
        {0: 'confiscated#0', 1: 7.0, 2: 8.0, 3: 9.0},
        {0: 'confiscated#1', 1: 10.0, 2: 11.0, 3: 12.0},
    ])
    
    print(df)
    
    result = defaultdict(list)
    for _, values in df.iterrows():
        values = list(values)
        result[int(values[0].split('#')[1])] += [values[1:]]
    print(result)
    

    结果:

                   0     1     2     3
    0    reviewers#0   1.0   2.0   3.0
    1    reviewers#1   4.0   5.0   6.0
    2  confiscated#0   7.0   8.0   9.0
    3  confiscated#1  10.0  11.0  12.0
    defaultdict(<class 'list'>, {0: [[1.0, 2.0, 3.0], [7.0, 8.0, 9.0]], 1: [[4.0, 5.0, 6.0], [10.0, 11.0, 12.0]]})
    

    一、 数据帧具有您描述的结构 result 你要的结果是什么?

    如果你需要 defaultdict