代码之家  ›  专栏  ›  技术社区  ›  Julian

如何计算R中多个单元格之间的距离?

r
  •  0
  • Julian  · 技术社区  · 5 年前

    我有多个不同的表型和每个细胞的xy坐标。计算同一张幻灯片中每个细胞之间距离的最简单方法是什么?我的数据集有10万个以上的单元格,所以我正在努力找出最有效的方法。

    数据帧的一个例子是:

    Xposition <- c(1,6,4,7,9,4,8,6,4)
    
    Yposition <- c(6,3,2,6,3,6,1,3,7)
    
    Phenotype <- c("A", "A", "B", "C", "C", "A", "A", "B", "B")
    
    SlideID <- c(111,111,111,111,111,112,112,112,112)
    
    df <- data.frame(Xposition, Yposition, Phenotype, SlideID)
    

    我在寻找一种可以给我一个数据帧的东西,其中的输出是这样的:

    CellType1 <- c("A", "A", "A", "A", "A", "A", "A", "B", "B", "C", "A", "A", "A", "A", "A", "B")
    
    Celltype2 <- c("A", "B", "C", "C", "B", "C", "C", "C", "C", "C", "A", "B", "B", "B", "B", "B")
    
    Distance <- c("5.83", "5", "6", "8.54", "2.23", "3.16", "3", "5", "5.09", "3.6", "6.4", "3.6", "1", "2.82", "7.21", "4.47")
    
    SlideID <- c("111", "111", "111", "111", "111", "111", "111", "111", "111", "111", "112", "112", "112", "112", "112", "112")
    
    distancedf <- data.frame(CellType1, Celltype2, Distance, SlideID)
    

    谢谢你的帮助!

    0 回复  |  直到 5 年前
        1
  •  0
  •   r2evans    5 年前

    我认为这里有模棱两可的余地,但是。。。

    res <- as.data.frame.table(as.matrix(dist(df[,1:2])))
    res$Var2 <- df$Phenotype[res$Var2]
    res$SlideID <- df$SlideID[res$Var1]
    res$Var1 <- df$Phenotype[res$Var1]
    head(res)
    #   Var1 Var2     Freq SlideID
    # 1    A    A 0.000000     111
    # 2    A    A 5.830952     111
    # 3    B    A 5.000000     111
    # 4    C    A 6.000000     111
    # 5    C    A 8.544004     111
    # 6    A    A 3.000000     112
    

    由此,您应该能够过滤掉 0 这很容易,但我想把它放在这里,以展示实际发生的事情。实际上 as.data.frame.table(...) 是怎么回事

    dist(df[,1:2])
    #          1        2        3        4        5        6        7        8
    # 2 5.830952                                                               
    # 3 5.000000 2.236068                                                      
    # 4 6.000000 3.162278 5.000000                                             
    # 5 8.544004 3.000000 5.099020 3.605551                                    
    # 6 3.000000 3.605551 4.000000 3.000000 5.830952                           
    # 7 8.602325 2.828427 4.123106 5.099020 2.236068 6.403124                  
    # 8 5.830952 0.000000 2.236068 3.162278 3.000000 3.605551 2.828427         
    # 9 3.162278 4.472136 5.000000 3.162278 6.403124 1.000000 7.211103 4.472136
    

    通过 这是:

    as.matrix(dist(df[,1:2]))
    #          1        2        3        4        5        6        7        8        9
    # 1 0.000000 5.830952 5.000000 6.000000 8.544004 3.000000 8.602325 5.830952 3.162278
    # 2 5.830952 0.000000 2.236068 3.162278 3.000000 3.605551 2.828427 0.000000 4.472136
    # 3 5.000000 2.236068 0.000000 5.000000 5.099020 4.000000 4.123106 2.236068 5.000000
    # 4 6.000000 3.162278 5.000000 0.000000 3.605551 3.000000 5.099020 3.162278 3.162278
    # 5 8.544004 3.000000 5.099020 3.605551 0.000000 5.830952 2.236068 3.000000 6.403124
    # 6 3.000000 3.605551 4.000000 3.000000 5.830952 0.000000 6.403124 3.605551 1.000000
    # 7 8.602325 2.828427 4.123106 5.099020 2.236068 6.403124 0.000000 2.828427 7.211103
    # 8 5.830952 0.000000 2.236068 3.162278 3.000000 3.605551 2.828427 0.000000 4.472136
    # 9 3.162278 4.472136 5.000000 3.162278 6.403124 1.000000 7.211103 4.472136 0.000000
    

    归根结底

    head(as.data.frame.table(as.matrix(dist(df[,1:2]))))
    #   Var1 Var2     Freq
    # 1    1    1 0.000000
    # 2    2    1 5.830952
    # 3    3    1 5.000000
    # 4    4    1 6.000000
    # 5    5    1 8.544004
    # 6    6    1 3.000000
    

    还有 0.000 s是距离矩阵的对角线(在 dist(...) ).


    SlideID :

    lapply(split(df, df$SlideID), function(x) { 
      res <- as.data.frame.table(as.matrix(dist(x[,1:2])))
      res$Var2 <- x$Phenotype[res$Var2]
      res$SlideID <- x$SlideID[res$Var1]
      res$Var1 <- x$Phenotype[res$Var1]
      res
    })
    # $`111`
    #    Var1 Var2     Freq SlideID
    # 1     A    A 0.000000     111
    # 2     A    A 5.830952     111
    # 3     B    A 5.000000     111
    # 4     C    A 6.000000     111
    # 5     C    A 8.544004     111
    # 6     A    A 5.830952     111
    # 7     A    A 0.000000     111
    # 8     B    A 2.236068     111
    # 9     C    A 3.162278     111
    # 10    C    A 3.000000     111
    # 11    A    B 5.000000     111
    # 12    A    B 2.236068     111
    # 13    B    B 0.000000     111
    # 14    C    B 5.000000     111
    # 15    C    B 5.099020     111
    # 16    A    C 6.000000     111
    # 17    A    C 3.162278     111
    # 18    B    C 5.000000     111
    # 19    C    C 0.000000     111
    # 20    C    C 3.605551     111
    # 21    A    C 8.544004     111
    # 22    A    C 3.000000     111
    # 23    B    C 5.099020     111
    # 24    C    C 3.605551     111
    # 25    C    C 0.000000     111
    # $`112`
    #    Var1 Var2     Freq SlideID
    # 1     A    A 0.000000     112
    # 2     A    A 6.403124     112
    # 3     B    A 3.605551     112
    # 4     B    A 1.000000     112
    # 5     A    A 6.403124     112
    # 6     A    A 0.000000     112
    # 7     B    A 2.828427     112
    # 8     B    A 7.211103     112
    # 9     A    B 3.605551     112
    # 10    A    B 2.828427     112
    # 11    B    B 0.000000     112
    # 12    B    B 4.472136     112
    # 13    A    B 1.000000     112
    # 14    A    B 7.211103     112
    # 15    B    B 4.472136     112
    # 16    B    B 0.000000     112