代码之家 › 专栏 › 技术社区 › Jerry George

按降序查找10个最近点

scipy-spatial nearest-neighbor distance scipy python

Jerry George · 技术社区 · 7 年前

每个点是一个300维向量。

我能找到终点。如何按降序找到最近的10个点?

最近点函数:

from scipy.spatial import distance
def closest_node(node,df):
    closest_index = distance.cdist([node],df.feature.tolist()).argmin()
    return pd.Series([df.title.tolist([closest_index],df.id.tolist()[closest_index]])

df3[["closest_title","closest_id"]]=df3.feature.apply(lambda row: closest_node(row,df2))

df2- pandas dataframe of 40,000 points (each 300 dimension)

谢谢

2 回复 | 直到 7 年前

chase 7 年前

只需对前10个节点的排序距离矩阵进行切片。像这样:

from scipy.spatial import distance

# Find the query node
query_node = df.iloc[10] ## Not sure what you're looking for

# Find the distance between this node and everyone else
euclidean_distances = df.apply(lambda row: distance.euclidean(row, query_node), axis=1)

# Create a new dataframe with distances.
distance_frame = pandas.DataFrame(data={"dist": euclidean_distances, "idx": euclidean_distances.index})
distance_frame.sort("dist", inplace=True)

# nodes
smallest_dist_ixs = distance_frame.iloc[1:10]["idx"]
most_similar_nodes = df.iloc[int(smallest_dist_ixs)]

我的假设是基于你在这里使用的“标题”这个词,以及300维向量的选择,这些是单词或短语向量。
Gensim实际上有一种方法可以根据这个想法得到前N个相似单词,这相当快。

https://tedboy.github.io/nlps/generated/generated/gensim.models.Word2Vec.most_similar.html

>>> trained_model.most_similar(positive=['woman', 'king'], negative=['man'])
[('queen', 0.50882536), ...]

对于稍有不同的情况,如果您想得到两个问题之间的最短路径,这也与旅行商问题(TSP)稍有相似点,然后简单地切掉前10个“城市”。

Google有一个非常简单和快速的python实现,其中包含或包含以下工具: https://developers.google.com/optimization/routing/tsp

ttreis 7 年前

由于我不知道您的完整代码是否有数据样本,我的建议如下:

推荐文章

mmonti · 求解微分方程的solve_ivp问题取决于t_span

1 年前

user11634 · scipy depth_first_order的前身

1 年前

Ishigami · 在python scipy中使用fsolve和quad求解积分方程组

1 年前

bougab · 为什么我不能将我的模型准确地拟合到杨的干涉数据中?

1 年前

Linear Algebra fans · 为什么在使用FFT的乘积对1D阵列执行卷积时,值不同?

1 年前

mayen · Z-Score作为差异值的度量

1 年前

impedance_gatto · Scipy中的指数衰减拟合

2 年前

MaximeJaccon · 优化Odint

2 年前

MaaikevR · stats.ttest_ind:提取df值

2 年前

jlewk · scipy.linprog可行性的Bug?(A_ub@x0<=b_ub).all()为True,但--linprog(np.zeros_like(x0),A_ub=A_ub,b_ub=b_ub

2 年前