代码之家  ›  专栏  ›  技术社区  ›  Kra.P

如何按单列分组并从多行和多列创建Python样式的列表字符串

  •  1
  • Kra.P  · 技术社区  · 1 年前

    我想按以下方式对数据进行分组 mcode 并为每个组创建两种不同类型的行。

    以下是示例数据。

       Cat1  Cat2  Cat3  mcode key   pcode needed
     1 C1    C2    C31   B3100 TRUE  P001  P001  
     2 C1    C2    C31   B3100 FALSE P002  P002  
     3 C1    C2    C31   B5500 TRUE  P003  P003  
     4 C1    C2    C31   B5500 FALSE P004  NA    
     5 C1    C2    C31   B5500 FALSE P005  NA    
     6 C1    C2    C32   B1000 TRUE  P006  NA    
     7 C1    C2    C32   B1000 FALSE P007  P007  
     8 C1    C2    C32   B1000 FALSE P008  NA    
     9 C1    C2    C32   B1000 FALSE P009  P009  
    10 C1    C2    C32   B1000 FALSE P010  P010  
    

    对于每个组,我想获取类别值( Cat1 , Cat2 , Cat3 )从行在哪里 key TRUE .

    此外,我需要创建Python风格的列表字符串,将以下所有值组合在一起 pcode needed 单独列,不包括 NA 价值观。

    请注意 钥匙 列是 真的 什么时候 mcode 首次具有不同的值。

    以下是预期的产出。

      mcode Cat1  Cat2  Cat3  type   extended_info                                             
    1 B1000 C1    C2    C32   pcode  ['P006','P007','P008','P009','P010']
    2 B1000 C1    C2    C32   needed ['P007','P009','P010']              
    3 B3100 C1    C2    C31   pcode  ['P001','P002']                     
    4 B3100 C1    C2    C31   needed ['P001','P002']                     
    5 B5500 C1    C2    C31   pcode  ['P003','P004','P005']              
    6 B5500 C1    C2    C31   needed ['P003']   
    

    这里有用于复制数据和输出的trible

    df <- tribble(
      ~Cat1, ~Cat2, ~Cat3, ~mcode, ~key,   ~pcode, ~needed,
      "C1",        "C2",        "C31",       "B3100",      TRUE,   "P001",       "P001",
      "C1",        "C2",        "C31",       "B3100",      FALSE,  "P002",       "P002",
      "C1",        "C2",        "C31",       "B5500",      TRUE,   "P003",       "P003",
      "C1",        "C2",        "C31",       "B5500",      FALSE,  "P004",       NA,
      "C1",        "C2",        "C31",       "B5500",      FALSE,  "P005",       NA,
      "C1",        "C2",        "C32",       "B1000",      TRUE,   "P006",       NA,
      "C1",        "C2",        "C32",       "B1000",      FALSE,  "P007",       "P007",
      "C1",        "C2",        "C32",       "B1000",      FALSE,  "P008",       NA,
      "C1",        "C2",        "C32",       "B1000",      FALSE,  "P009",       "P009",
      "C1",        "C2",        "C32",       "B1000",      FALSE,  "P010",       "P010"
    )
    expected_output <- tribble(
      ~mcode, ~Cat1, ~Cat2, ~Cat3, ~type,   ~extended_info,
      "B1000", "C1", "C2", "C32", "pcode",  "['P006','P007','P008','P009','P010']",
      "B1000", "C1", "C2", "C32", "needed", "['P007','P009','P010']",
      "B3100", "C1", "C2", "C31", "pcode",  "['P001','P002']",
      "B3100", "C1", "C2", "C31", "needed", "['P001','P002']",
      "B5500", "C1", "C2", "C31", "pcode",  "['P003','P004','P005']",
      "B5500", "C1", "C2", "C31", "needed", "['P003']"
    )
    
    1 回复  |  直到 1 年前
        1
  •  0
  •   yuk    1 年前

    看起来你只有一排 key = TRUE 对于每一个 mcode .

    这样的东西应该能得到你需要的东西:

    expected_output <- df %>% 
      summarise(Cat1 = first(Cat1[key]), 
                Cat2 = first(Cat2[key]),
                Cat3 = first(Cat3[key]),
                pcode = list(sort(unique(pcode[!is.na(pcode)]))),
                needed = list(sort(unique(needed[!is.na(needed)]))),
                .by = mcode) %>% 
      pivot_longer(cols = c(pcode, needed), names_to = "type", values_to = "extended_info") %>% 
      arrange(mcode)
    
        2
  •  0
  •   B. Christian Kamgang    1 年前

    这可能很有用:

    df %>% 
      reframe(Cat1 = first(Cat1[key]), 
              Cat2 = first(Cat2[key]),
              Cat3 = first(Cat3[key]),
              extended_info = list(sort(unique(na.omit(pcode))), sort(unique(na.omit(needed)))),
              .by = mcode)