代码之家  ›  专栏  ›  技术社区  ›  RustyShackleford

如何在数据框架内同时将列和透视列值合并到新列中?

  •  0
  • RustyShackleford  · 技术社区  · 6 年前

    我有一个像这样的df(列 10_value 需要紧挨着 10_type ,不确定如何格式化):

     0_0_type  0_0_value 0_1_type 0_1_value 0_firstname_value 0_lastname_value 10_0_type  
       uuid        1       ID         2           test1          test1    uuid   
       uuid        2    EMAIL    email1             NaN            NaN    uuid   
       uuid        3    EMAIL    email2             NaN            NaN    uuid   
       uuid        4    EMAIL    email3           test2          test2    uuid   
       uuid        5    EMAIL    email4             NaN            NaN    uuid   
       uuid        6    EMAIL    email5           test3          test3    uuid   
       uuid        7    EMAIL    email6           test4          test4    uuid   
       uuid        8    EMAIL    email7           test5          test5    uuid   
       uuid        9    EMAIL    email8           test6          test6    uuid   
    
       10_0_value 10_1_type  10_1_value   10_firstname_value  10_lastname_value  
            10     EMAIL     email9             test7            test7  
            11     EMAIL    email10             test8            test8  
            12     EMAIL    email11             test9            test9  
            13     EMAIL    email12            test10           test10  
            14     EMAIL    email13            test11           test11  
            15     EMAIL    email14            test12           test12  
            16     EMAIL    email15            test13           test13  
            17     EMAIL    email16            test14           test14  
            18     EMAIL    email17            test15           test15 
    

    我有1500k列,它们具有这种类型的列名结构。唯一的区别是前面的数字在变化,在这种情况下 0 10 .

    我只想要四份数据 uuid,email,first_name,last_name .

    我如何:

    1. 扫描整个数据帧以查找短语 first_name last_name 把记录堆在一起

    同时

    1. 提取液 EMAIL uuid 变成自己的专栏?

    最终的df应该如下所示:

       uuid    EMAIL  first_name      last_name
    0       1      NaN       test1          test1
    1       2   email1         NaN            NaN
    2       3   email2         NaN            NaN
    3       4   email3       test2          test2
    4       5   email4         NaN            NaN
    5       6   email5       test3          test3
    6       7   email6       test4          test4
    7       8   email7       test5          test5
    8       9   email8       test6          test6
    9      10   email9       test7          test7
    10     11  email10       test8          test8
    11     12  email11       test9          test9
    12     13  email12      test10         test10
    13     14  email13      test11         test11
    14     15  email14      test12         test12
    15     16  email15      test13         test13
    16     17  email16      test14         test14
    17     18  email17      test15         test15
    
    1 回复  |  直到 6 年前
        1
  •  1
  •   Scott Boston    6 年前

    好吧,让我们试试这些列命名的代码。首先,重命名列,将递增索引移动到分隔符后,“:”我选择。然后使用 pd.wide_to_long 重塑数据帧,最后 filter 只选择“值”列的数据帧。

    df.columns = df.columns.str.replace(r'(\d+)_(\w+)',r'\2:\1')
    pd.wide_to_long(df.reset_index(),
                    ['0_type','1_type','0_value','1_value','firstname_value','lastname_value'],
                    'index',
                    'Num',
                    sep=':')\
      .reset_index()\
      .filter(like='value')
    

    输出:

        0_value  1_value firstname_value lastname_value
    0         1        2           test1          test1
    1         2   email1             NaN            NaN
    2         3   email2             NaN            NaN
    3         4   email3           test2          test2
    4         5   email4             NaN            NaN
    5         6   email5           test3          test3
    6         7   email6           test4          test4
    7         8   email7           test5          test5
    8         9   email8           test6          test6
    9        10   email9           test7          test7
    10       11  email10           test8          test8
    11       12  email11           test9          test9
    12       13  email12          test10         test10
    13       14  email13          test11         test11
    14       15  email14          test12         test12
    15       16  email15          test13         test13
    16       17  email16          test14         test14
    17       18  email17          test15         test15