代码之家 › 专栏 › 技术社区 › oconnor0

是否有最近的键映射数据结构?

data-structures algorithm

oconnor0 · 技术社区 · 15 年前

在这种情况下,我需要使用最接近我请求的键来查找值。这有点像一张最近的地图,它定义了键之间的距离。

例如,如果我在地图中有a、c、m、z键,那么对d的请求将返回c的值。

有什么想法吗?

7 回复 | 直到 10 年前

Guss 15 年前

大多数树数据结构使用某种排序算法来存储和查找键。许多这样的实现都可以找到一个接近您使用的密钥的密匙(通常是最接近的下面或最接近的上面)。例如Java的 TreeMap 实现这样的数据结构,您可以告诉它在查找键下面获取最近的键,或者在查找键上面获取最近的键。( higherKey 和 lowerKey )

如果你能计算距离(它并不总是容易的),Java的接口只需要你知道任何给定的键是“低于”或“以上”任何其他给定的键,那么你可以要求最上面和最下面的两个,然后计算自己哪一个更近。

Eamon Nerbonne 15 年前

你的数据的维度是什么?如果它只是一维的,一个排序的数组会做到这一点——二进制搜索会找到精确的匹配和/或揭示你的搜索键位于哪两个键之间——一个简单的测试会告诉你哪个更接近。

如果您不仅需要查找最近的键,还需要查找关联的值,请维护一个排序相同的值数组-键数组中检索到的键的索引是值数组中值的索引。

当然,有 许多的 其他方法-使用哪种方法取决于许多其他因素,例如内存消耗、是否需要插入值、如果控制插入顺序、删除、线程问题等…

Daniel C. Sobral 15 年前

BK-trees 做你想做的。这里有一个 good article 关于实现它们。

下面是一个scala实现:

class BKTree[T](computeDistance: (T, T) => Int, node: T) {
  val subnodes = scala.collection.mutable.HashMap.empty[Int,BKTree[T]]

  def query(what: T, distance: Int): List[T] = {
    val currentDistance = computeDistance(node, what)
    val minDistance = currentDistance - distance
    val maxDistance = currentDistance + distance
    val elegibleNodes = (
      subnodes.keys.toList 
      filter (key => minDistance to maxDistance contains key) 
      map subnodes
    )
    val partialResult = elegibleNodes flatMap (_.query(what, distance))
    if (currentDistance <= distance) node :: partialResult else partialResult
  }

  def insert(what: T): Boolean = if (node == what) false else (
    subnodes.get(computeDistance(node, what)) 
    map (_.insert(what)) 
    getOrElse {
      subnodes(computeDistance(node, what)) = new BKTree(computeDistance, what)
      true
    }
  )

  override def toString = node.toString+"("+subnodes.toString+")"
}

object Test {
  def main(args: Array[String]) {
    val root = new BKTree(distance, 'A')
    root.insert('C')
    root.insert('M')
    root.insert('Z')
    println(findClosest(root, 'D'))
  }
  def charDistance(a: Char, b: Char) = a - b abs
  def findClosest[T](root: BKTree[T], what: T): List[T] = {
    var distance = 0
    var closest = root.query(what, distance)
    while(closest.isEmpty) {
      distance += 1
      closest = root.query(what, distance)
    }
    closest
  }
}

我承认这件事有些肮脏和丑陋,而且插入算法太聪明了。而且,它只能在很短的距离内工作,否则您将重复搜索树。下面是一个更好的替代实现:

class BKTree[T](computeDistance: (T, T) => Int, node: T) {
  val subnodes = scala.collection.mutable.HashMap.empty[Int,BKTree[T]]

  def query(what: T, distance: Int): List[T] = {
    val currentDistance = computeDistance(node, what)
    val minDistance = currentDistance - distance
    val maxDistance = currentDistance + distance
    val elegibleNodes = (
      subnodes.keys.toList 
      filter (key => minDistance to maxDistance contains key) 
      map subnodes
    )
    val partialResult = elegibleNodes flatMap (_.query(what, distance))
    if (currentDistance <= distance) node :: partialResult else partialResult
  }

  private def find(what: T, bestDistance: Int): (Int,List[T]) = {
    val currentDistance = computeDistance(node, what)
    val presentSolution = if (currentDistance <= bestDistance) List(node) else Nil
    val best = currentDistance min bestDistance
    subnodes.keys.foldLeft((best, presentSolution))(
      (acc, key) => {
        val (currentBest, currentSolution) = acc
        val (possibleBest, possibleSolution) = 
          if (key <= currentDistance + currentBest)
            subnodes(key).find(what, currentBest)
          else
            (0, Nil)
        (possibleBest, possibleSolution) match {
          case (_, Nil) => acc
          case (better, solution) if better < currentBest => (better, solution)
          case (_, solution) => (currentBest, currentSolution ::: solution)
        }
      }
    )
  }

  def findClosest(what: T): List[T] = find(what, computeDistance(node, what))._2

  def insert(what: T): Boolean = if (node == what) false else (
    subnodes.get(computeDistance(node, what)) 
    map (_.insert(what)) 
    getOrElse {
      subnodes(computeDistance(node, what)) = new BKTree(computeDistance, what)
      true
    }
  )

  override def toString = node.toString+"("+subnodes.toString+")"
}

object Test {
  def main(args: Array[String]) {
    val root = new BKTree(distance, 'A')
    root.insert('C')
    root.insert('E')
    root.insert('M')
    root.insert('Z')
    println(root.findClosest('D'))
  }
  def charDistance(a: Char, b: Char) = a - b abs
}

John_West 10 年前

用C++和STL容器( std::map )您可以使用以下模板功能:

#include <iostream>
#include <map>

//!This function returns nearest by metric specified in "operator -" of type T
//!If two items in map are equidistant from item_to_find, the earlier occured by key will be returned

template <class T,class U> typename std::map<T,U>::iterator find_nearest(std::map<T,U> map_for_search,const T& item_to_find)
{
  typename std::map<T,U>::iterator itlow,itprev;
  itlow=map_for_search.lower_bound(item_to_find);
  itprev=itlow;
  itprev--;
//for cases when we have "item_to_find" element in our map
//or "item_to_find" occures before the first element of map
  if ((itlow->first==item_to_find) || (itprev==map_for_search.begin()))
    return itlow;
//if "item"to_find" is besides the last element of map
  if (itlow==map_for_search.end())
    return itprev;

  return (itlow->first-item_to_find < item_to_find-itprev->first)?itlow:itprev; // C will be returned
//note that "operator -" is used here as a function for distance metric
}

int main ()
{
  std::map<char,int> mymap;
  std::map<char,int>::iterator nearest;
  //fill map with some information
  mymap['B']=20;
  mymap['C']=40;
  mymap['M']=60;
  mymap['Z']=80;
  char ch='D'; //C should be returned
  nearest=find_nearest<char,int>(mymap,ch);
  std::cout << nearest->first << " => " << nearest->second << '\n';
  ch='Z'; //Z should be returned
  nearest=find_nearest<char,int>(mymap,ch);
  std::cout << nearest->first << " => " << nearest->second << '\n';
  ch='A'; //B should be returned
  nearest=find_nearest<char,int>(mymap,ch);
  std::cout << nearest->first << " => " << nearest->second << '\n';
  ch='H'; // equidistant to C and M -> C is returned
  nearest=find_nearest<char,int>(mymap,ch);
  std::cout << nearest->first << " => " << nearest->second << '\n';
  return 0;
}

输出:

C => 40
Z => 80
B => 20
C => 40

假设 operator - 用作计算距离的函数。如果 class T 是您自己的类,其中的对象用作映射中的键。您也可以将代码更改为使用特殊的 T类 静态成员函数(比如, distance ),而不是 操作员- ,而不是:

return (T::distance(itlow->first,item_to_find) < T::distance(item_to_find,itprev->first))?itlow:itprev;

哪里 距离 应该是smth。喜欢

static distance_type some_type::distance()(const some_type& first, const some_type& second){//...}

和 distance_type 应支持比较 operator <

ire_and_curses 15 年前

您可以将类似这样的东西实现为树。一种简单的方法是为树中的每个节点分配一个位串。树的每个级别都存储为一个位。所有父信息都编码在节点的位字符串中。然后,您可以轻松地定位任意节点,并查找父节点和子节点。这就是为什么 Morton ordering 例如,工作。它还有一个额外的优点,即您可以通过简单的二进制减法来计算节点之间的距离。

如果数据值之间有多个链接,那么数据结构是一个图而不是树。在这种情况下,您需要一个稍微复杂一点的索引系统。 Distributed hash tables 做这种事。它们通常有一种方法来计算索引空间中任意两个节点之间的距离。例如, Kademlia 算法(由BitTorrent使用)使用应用于位串ID的异或距离。这允许BitTorrent客户机在一个链中查找ID,并在未知的目标位置聚合。可以使用类似的方法查找最接近目标节点的节点。

Frank 15 年前

如果您的键是字符串,并且您的相似性函数是 Levenshtein distance ,然后您可以使用 finite-state machines :

你的地图是 trie 构建为有限状态机(通过联合所有键/值对并确定)。然后,用一个简单的有限状态传感器编写输入查询,该传感器对Levenshtein距离进行编码,然后用trie编写输入查询。然后,使用 Viterbi algorithm 提取最短路径。

您可以使用 finite-state toolkit .

Andrew Cassidy 10 年前

在scala中,这是一种我用来查找与您要查找的键最近的int<=的技术。

val sMap = SortedMap(1 -> "A", 2 -> "B", 3 -> "C")
sMap.to(4).lastOption.get // Returns 3
sMap.to(-1) // Returns an empty Map