代码之家 › 专栏 › 技术社区 › user6156267

R中的which()函数-按降序排序后,会与重复值进行匹配

zipcode r

0

user6156267 · 技术社区 · 7 年前

我试图从商店ID、邮政编码和每个邮政编码的长/纬度坐标矩阵中找到下一个最近的商店。当每个zipcode有超过1个存储,并且脚本不知道如何排序2个相同的值时,就会出现问题(存储x距离10英里,存储y距离10英里,对x和y的顺序有问题,并且返回(c(x,y)),而不是x,y或y,x)。我需要找到一种方法,让我的代码知道如何列出这两个项目(任意顺序,因为根据邮政编码,它们离商店的距离相同)。

请注意,所有商店都在运行,只有100家左右与另一家商店具有相同zipcode的商店被绊倒了——我希望不要手动浏览和编辑csv。

library(data.table)
library(zipcode)
library(geosphere)
source<-read.csv("C:\\Users\\mcan\Desktop\\Projects\\Closest Store\\Site and Zip.csv",header=TRUE, sep=",") #open
zip<-source[,2] #break apart the source zip codes 
ID<-source[,1] #break apart the IDs
zip<-clean.zipcodes(zip) #clean up the zipcodes 
CleanedData<-data.frame(ID,zip) #combine the IDs and cleaned Zip codes
CleanedData<-merge(x=CleanedData,y=zipcode,by="zip",all.x=TRUE) #dataset of store IDs, zipcodes, and their long/lat positions
setDT(CleanedData) #set data frame to data table 
storeDistances <- distm(CleanedData[,.(longitude,latitude)],CleanedData[,.(longitude,latitude)]) #matrix between long/lat points of all stores in list 
colnames(storeDistances) <- rownames(storeDistances) <- CleanedData[,ID] 
whatsClosest <- function(number=1){
    apply(storeDistances,1,function(x) (colnames(storeDistances)[which(x==sort(x)[number+1])])) #sorts in descending order and picks the 2nd closest distance, matches with storeID
}
CleanedData[,firstClosestSite:=whatsClosest(1)] #looks for 1st closest store
CleanedData[,secondClosestSite:=whatsClosest(2)] #looks for 2nd closest store
CleanedData[,thirdClosestSite:=whatsClosest(3)] #looks for 3rd closest store

数据集格式:

 Classes âdata.tableâ and 'data.frame': 1206 obs. of  9 variables:
     $ zip              : Factor w/ 1182 levels "01234","02345",..: 1 2 3 4 5 6 7 8 9 10 ...
     $ ID               : int  11111 12222 13333 10528 ...
     $ city             : chr  "Boston" "Somerville" "Cambridge" "Weston" ...
     $ state            : chr  "MA" "MA" "MA" "MA" ...
     $ latitude         : num  40.0 41.0 42.0 43.0 ...
     $ longitude        : num  -70.0 -70.1 -70.2 -70.3 -70.4 ...
    $ firstClosestSite :List of 1206
      ..$ : chr "12345"
    $ secondClosestSite :List of 1206
      ..$ : chr "12344"
    $ thirdClosestSite :List of 1206
      ..$ : chr "12343"

StoreID      Zip       City       State    Longitude  Latitude FirstClosestSite
11222       11000     Boston      MA       40.0       -70.0    c("11111""12222")
    
SecondClosestSite     ThirdClosestSite
c("11111"    "12222")   13333

距离矩阵的形成示例(第一行和第一列存储ID,矩阵值为存储ID之间的距离):

    11111   22222     33333   44444   55555   66666
11111   0      6000    32000   36000  28000   28000
22222   6000    0      37500   40500  32000   32000
33333   32000   37500   0      11000   6900   6900
44444   36000   40500   11000   0     8900    8900
55555   28000   32000   6900    8900    0     0
66666   28000   32000   6900    8900    0     0

问题是每行中的重复项…which()不知道哪个存储最接近11111(55555或66666)。

1 回复 | 直到 4 年前

1

0

Oriol Mirosa 7 年前

这是我试图解决的问题。所有事情直到 colnames(storeDistances) <- ...

whatsClosestList <- sapply(as.data.frame(storeDistances), function(x) list(data.frame(distance = x, store = rownames(storeDistances), stringsAsFactors = F)))

# Get the names of the stores
# this step is necessary because lapply doesn't allow us
# to access the list names
storeNames = names(whatsClosestList)

# Iterate through each store's data frame using storeNames
# and delete the distance to itself
whatsClosestListRemoveSelf <- lapply(storeNames, function(name) {
  df <- whatsClosestList[[name]]
  df <- df[!df$store == name,]
})

# The previous step got rid of the store names in the list,
# so we add them again here
names(whatsClosestListRemoveSelf) <- storeNames

whatsClosestOrderedList <- lapply(whatsClosestListRemoveSelf, function(df) { df[order(df$distance),] })

whatsClosestTopThree <- lapply(whatsClosestOrderedList, function(df) { df$store[1:3] })

firstClosestSite <- lapply(whatsClosestTopThree, function(x) { x[1]} )
secondClosestSite <- lapply(whatsClosestTopThree, function(x) { x[2]} )
thirdClosestSite <- lapply(whatsClosestTopThree, function(x) { x[3]} )

CleanedData[,firstClosestSite:=firstClosestSite] #looks for 1st closest store in list
CleanedData[,secondClosestSite:=secondClosestSite] #looks for 2nd closest store in list 
CleanedData[,thirdClosestSite:=thirdClosestSite] #looks for 3rd closest store in list

CleanedData .希望它能起作用!