代码之家  ›  专栏  ›  技术社区  ›  Rahul Agarwal

子群上的新列和另一列中的百分比范围

  •  3
  • Rahul Agarwal  · 技术社区  · 7 年前

    我有一个示例df如下:

    df_test<- data.frame("Group.Name"=c("Group1","Group2","Group1","Group2","Group2","Group2","Group1"),
                    "Sub_group_name"=c("A","A","B","C","D","E","C"),
                    "Total%"=c(35,26,10,9,5,11,13))
    

    • 一个组下有多个子组,上面的df显示了一些子组
    • 一个组+子组的总百分比将增加到100%。在上面它不是因为它只是一个样本。所以,为了 Group1 A, B, C 等将加起来100&所以,对于 ". 两者的子组 组1 或多或少都是一样的

    询问:

    Category 它可以在 Total% Group.Name

    • 组名称 无论在哪里 总计% Sub_group_name 名字是。

    • 总计% 在10-30之间,类别列为“ 新建\u组1 ".

    • 组名称 总计% 小于10,类别列为“ 新建\u组2 ".

    预期产量:

    df_output<- data.frame("Group.Name"=c("Group1","Group2","Group1","Group2","Group2","Group2","Group1"),
                         "Sub_group_name"=c("A","A","B","C","D","E","C"),
                         "Total%"=c(35,26,10,9,5,11,13),
                         "category"=c("A","A","New_Group1","New_Group1","New_Group2","New_Group1","New_Group1"))
    
    1 回复  |  直到 7 年前
        1
  •  1
  •   akrun    7 年前

    我们可以用 cut 创建 labels 与相应的 breaks 然后将每个“Group.Name”中最高的“Total”替换为相应的“Sub\u Group\u Name”

    library(dplyr)
    df_test %>% 
      group_by(Group.Name) %>%
      mutate(category = as.character(cut(`Total%`, breaks = c(-Inf,10, 30, Inf), 
              labels = c("New_Group2", "New_Group1", "Other"), right = FALSE)), 
             category = case_when(`Total%` == max(`Total%`) ~ 
                              Sub_group_name,
                                       TRUE ~ category))
    # A tibble: 7 x 4
    # Groups:   Group.Name [2]
    #  Group.Name Sub_group_name `Total%` category  
    #  <chr>      <chr>             <dbl> <chr>     
    #1 Group1     A                    35 A         
    #2 Group2     A                    26 A         
    #3 Group1     B                    10 New_Group1
    #4 Group2     C                     9 New_Group2
    #5 Group2     D                     5 New_Group2
    #6 Group2     E                    11 New_Group1
    #7 Group1     C                    13 New_Group1
    

    数据

    df_test<- data.frame("Group.Name"=c("Group1","Group2","Group1","Group2","Group2",
            "Group2","Group1"),
                 "Sub_group_name"=c("A","A","B","C","D","E","C"),
              "Total%"=c(35,26,10,9,5,11,13), stringsAsFactors = FALSE, 
                  check.names = FALSE)