代码之家  ›  专栏  ›  技术社区  ›  nak5120

R中余弦相似矩阵的前N值

  •  3
  • nak5120  · 技术社区  · 4 年前

    如何获得余弦相似矩阵的顶部对,如下所示:

    southpark_matrix <- structure(c(0, 0.165272735625452, 0.386480286121192, 0.170696960480773, 
    0.0869562860988618, 0.165272735625452, 0, 0.251690602341816, 
    0.472701602991984, 0.137486001150133, 0.386480286121192, 0.251690602341816, 
    0, 0.255849200006255, 0.0972813221214626, 0.170696960480773, 
    0.472701602991984, 0.255849200006255, 0, 0.156449701347234, 0.0869562860988618, 
    0.137486001150133, 0.0972813221214626, 0.156449701347234, 0), .Dim = c(5L, 
    5L), .Dimnames = list(Docs = c("Mr. Garrison_2", "Cartman_3", 
    "Mr. Garrison_3", "Cartman_4", "Jimbo_5"), Docs = c("Mr. Garrison_2", 
    "Cartman_3", "Mr. Garrison_3", "Cartman_4", "Jimbo_5")))
    

    南方公园矩阵

                    Docs
    Docs             Mr. Garrison_2 Cartman_3 Mr. Garrison_3 Cartman_4    Jimbo_5
      Mr. Garrison_2     0.00000000 0.1652727     0.38648029 0.1706970 0.08695629
      Cartman_3          0.16527274 0.0000000     0.25169060 0.4727016 0.13748600
      Mr. Garrison_3     0.38648029 0.2516906     0.00000000 0.2558492 0.09728132
      Cartman_4          0.17069696 0.4727016     0.25584920 0.0000000 0.15644970
      Jimbo_5            0.08695629 0.1374860     0.09728132 0.1564497 0.00000000
    

    我怎样才能得到前两对呢?

    Cartman_3 Cartman_4             0.4727016
    Mr. Garrison_2 Mr. Garrison_3   0.38648029
    
    2 回复  |  直到 4 年前
        1
  •  4
  •   Joel Kandiah    4 年前

    我要做的就是把矩阵转换成tible。我们可以按照以下步骤将矩阵的上三角部分转换为2列的数据帧(请参见此处: Convert upper triangular part of a matrix to 3-column long format

    在此之后,我们可以简单地使用由我们的值指定的top(2,val)函数。此步骤的另一种方法是使用arrange(desc(val))按降序排列值,然后使用head(2)函数获取前2个值。

    library(tidyverse)
    #> Warning: package 'ggplot2' was built under R version 4.0.4
    #> Warning: package 'tibble' was built under R version 4.0.4
    #> Warning: package 'tidyr' was built under R version 4.0.4
    #> Warning: package 'readr' was built under R version 4.0.3
    #> Warning: package 'dplyr' was built under R version 4.0.4
    #> Warning: package 'forcats' was built under R version 4.0.4
    
    
    southpark_matrix <- structure(c(0, 0.165272735625452, 0.386480286121192, 0.170696960480773, 
                                    0.0869562860988618, 0.165272735625452, 0, 0.251690602341816, 
                                    0.472701602991984, 0.137486001150133, 0.386480286121192, 0.251690602341816, 
                                    0, 0.255849200006255, 0.0972813221214626, 0.170696960480773, 
                                    0.472701602991984, 0.255849200006255, 0, 0.156449701347234, 0.0869562860988618, 
                                    0.137486001150133, 0.0972813221214626, 0.156449701347234, 0), .Dim = c(5L, 
                                                                                                           5L), .Dimnames = list(Docs = c("Mr. Garrison_2", "Cartman_3", 
                                                                                                                                          "Mr. Garrison_3", "Cartman_4", "Jimbo_5"), Docs = c("Mr. Garrison_2", 
                                                                                                                                                                                              "Cartman_3", "Mr. Garrison_3", "Cartman_4", "Jimbo_5")))
    
    # Convert the matrix to an upper diagonal form
    ind <- which(upper.tri(southpark_matrix, diag = TRUE), arr.ind = TRUE)
    dimnam <- dimnames(southpark_matrix)
    df <- data.frame(row = dimnam[[1]][ind[, 1]],
               col = dimnam[[2]][ind[, 2]],
               val = southpark_matrix[ind])
    #top n method
    df %>%
      tibble() %>% 
      top_n(2, val)
    #> # A tibble: 2 x 3
    #>   row            col              val
    #>   <chr>          <chr>          <dbl>
    #> 1 Mr. Garrison_2 Mr. Garrison_3 0.386
    #> 2 Cartman_3      Cartman_4      0.473
    
    #arrange and head method
    df %>% 
      arrange(desc(val)) %>% 
      head(2)
    #> # A tibble: 2 x 3
    #>   row            col              val
    #>   <chr>          <chr>          <dbl>
    #> 1 Cartman_3      Cartman_4      0.473
    #> 2 Mr. Garrison_2 Mr. Garrison_3 0.386
    

    创建于2021-04-04 reprex package

        2
  •  1
  •   Waldi    4 年前

    lapply :

    best <- head(unique(sort(southpark_matrix,decreasing=T)),2)
    lapply(best,function(x) {list(score=x, names = rownames(which(southpark_matrix == x,arr.ind=T)))})
    
    [[1]]
    [[1]]$score
    [1] 0.4727016
    
    [[1]]$names
    [1] "Cartman_4" "Cartman_3"
    
    
    [[2]]
    [[2]]$score
    [1] 0.3864803
    
    [[2]]$names
    [1] "Mr. Garrison_3" "Mr. Garrison_2"
    
    推荐文章