代码之家  ›  专栏  ›  技术社区  ›  user6156267

R中的which()函数-按降序排序后,会与重复值进行匹配

  •  0
  • user6156267  · 技术社区  · 7 年前

    我试图从商店ID、邮政编码和每个邮政编码的长/纬度坐标矩阵中找到下一个最近的商店。当每个zipcode有超过1个存储,并且脚本不知道如何排序2个相同的值时,就会出现问题(存储x距离10英里,存储y距离10英里,对x和y的顺序有问题,并且返回(c(x,y)),而不是x,y或y,x)。我需要找到一种方法,让我的代码知道如何列出这两个项目(任意顺序,因为根据邮政编码,它们离商店的距离相同)。

    请注意,所有商店都在运行,只有100家左右与另一家商店具有相同zipcode的商店被绊倒了——我希望不要手动浏览和编辑csv。

    library(data.table)
    library(zipcode)
    library(geosphere)
    source<-read.csv("C:\\Users\\mcan\Desktop\\Projects\\Closest Store\\Site and Zip.csv",header=TRUE, sep=",") #open
    zip<-source[,2] #break apart the source zip codes 
    ID<-source[,1] #break apart the IDs
    zip<-clean.zipcodes(zip) #clean up the zipcodes 
    CleanedData<-data.frame(ID,zip) #combine the IDs and cleaned Zip codes
    CleanedData<-merge(x=CleanedData,y=zipcode,by="zip",all.x=TRUE) #dataset of store IDs, zipcodes, and their long/lat positions
    setDT(CleanedData) #set data frame to data table 
    storeDistances <- distm(CleanedData[,.(longitude,latitude)],CleanedData[,.(longitude,latitude)]) #matrix between long/lat points of all stores in list 
    colnames(storeDistances) <- rownames(storeDistances) <- CleanedData[,ID] 
    whatsClosest <- function(number=1){
        apply(storeDistances,1,function(x) (colnames(storeDistances)[which(x==sort(x)[number+1])])) #sorts in descending order and picks the 2nd closest distance, matches with storeID
    }
    CleanedData[,firstClosestSite:=whatsClosest(1)] #looks for 1st closest store
    CleanedData[,secondClosestSite:=whatsClosest(2)] #looks for 2nd closest store
    CleanedData[,thirdClosestSite:=whatsClosest(3)] #looks for 3rd closest store 
    

    数据集格式:

     Classes ‘data.table’ and 'data.frame': 1206 obs. of  9 variables:
         $ zip              : Factor w/ 1182 levels "01234","02345",..: 1 2 3 4 5 6 7 8 9 10 ...
         $ ID               : int  11111 12222 13333 10528 ...
         $ city             : chr  "Boston" "Somerville" "Cambridge" "Weston" ...
         $ state            : chr  "MA" "MA" "MA" "MA" ...
         $ latitude         : num  40.0 41.0 42.0 43.0 ...
         $ longitude        : num  -70.0 -70.1 -70.2 -70.3 -70.4 ...
        $ firstClosestSite :List of 1206
          ..$ : chr "12345"
        $ secondClosestSite :List of 1206
          ..$ : chr "12344"
        $ thirdClosestSite :List of 1206
          ..$ : chr "12343"
    

    StoreID      Zip       City       State    Longitude  Latitude FirstClosestSite
    11222       11000     Boston      MA       40.0       -70.0    c("11111""12222")
        
    SecondClosestSite     ThirdClosestSite
    c("11111"    "12222")   13333
    

    距离矩阵的形成示例(第一行和第一列存储ID,矩阵值为存储ID之间的距离):

        11111   22222     33333   44444   55555   66666
    11111   0      6000    32000   36000  28000   28000
    22222   6000    0      37500   40500  32000   32000
    33333   32000   37500   0      11000   6900   6900
    44444   36000   40500   11000   0     8900    8900
    55555   28000   32000   6900    8900    0     0
    66666   28000   32000   6900    8900    0     0
    

    问题是每行中的重复项…which()不知道哪个存储最接近11111(55555或66666)。

    1 回复  |  直到 4 年前
        1
  •  0
  •   Oriol Mirosa    7 年前

    这是我试图解决的问题。所有事情直到 colnames(storeDistances) <- ...

    whatsClosestList <- sapply(as.data.frame(storeDistances), function(x) list(data.frame(distance = x, store = rownames(storeDistances), stringsAsFactors = F)))
    
    # Get the names of the stores
    # this step is necessary because lapply doesn't allow us
    # to access the list names
    storeNames = names(whatsClosestList)
    
    # Iterate through each store's data frame using storeNames
    # and delete the distance to itself
    whatsClosestListRemoveSelf <- lapply(storeNames, function(name) {
      df <- whatsClosestList[[name]]
      df <- df[!df$store == name,]
    })
    
    # The previous step got rid of the store names in the list,
    # so we add them again here
    names(whatsClosestListRemoveSelf) <- storeNames
    
    whatsClosestOrderedList <- lapply(whatsClosestListRemoveSelf, function(df) { df[order(df$distance),] })
    
    whatsClosestTopThree <- lapply(whatsClosestOrderedList, function(df) { df$store[1:3] })
    
    firstClosestSite <- lapply(whatsClosestTopThree, function(x) { x[1]} )
    secondClosestSite <- lapply(whatsClosestTopThree, function(x) { x[2]} )
    thirdClosestSite <- lapply(whatsClosestTopThree, function(x) { x[3]} )
    
    CleanedData[,firstClosestSite:=firstClosestSite] #looks for 1st closest store in list
    CleanedData[,secondClosestSite:=secondClosestSite] #looks for 2nd closest store in list 
    CleanedData[,thirdClosestSite:=thirdClosestSite] #looks for 3rd closest store in list
    

    CleanedData .希望它能起作用!