代码之家  ›  专栏  ›  技术社区  ›  waterlooalex

如何在Scala中进行快速前缀字符串匹配

  •  4
  • waterlooalex  · 技术社区  · 15 年前

    我正在使用一些Java代码进行快速前缀查找,使用Java.util.TreeSet,是否可以改用scala的TreeSet?还是另一种解决方案?

    /** A class that uses a TreeSet to do fast prefix matching
     */
    class PrefixMatcher {
      private val _set = new java.util.TreeSet[String]
    
      def add(s: String) = _set.add(s)
    
      def findMatches(prefix: String): List[String] = {
        val matches = new ListBuffer[String]
        val tailSet = _set.tailSet(prefix)
        for ( tail <- tailSet.toArray ) {
          val tailString = tail.asInstanceOf[String]
          if ( tailString.startsWith(prefix) ) 
            matches += tailString
          else
            return matches.toList    
        }
    
        matches.toList
      }
    }
    
    5 回复  |  直到 15 年前
        1
  •  6
  •   Thomas Jung    15 年前

    这个 来自&花时间

    class PrefixMatcher {
        private val _set = new TreeSet[String]
        def add(s: String) = _set.add(s)
        def findMatches(prefix: String): Iterable[String] =
            _set from prefix takeWhile(_ startsWith prefix)
    }
    

    另一种选择是 选择一个子集

    在increment方法中仍有一些工作(溢出和空字符串将不起作用),但目的应该是明确的。

    class PrefixMatcher {
      private val _set = new java.util.TreeSet[String]
    
      def add(s: String) = _set.add(s)
    
      def findMatches(prefix: String) : Set[String] = {
        def inc(x : String) = { //ignores overflow
           assert(x.length > 0) 
           val last = x.length - 1
           (x take last) + (x(last) + 1).asInstanceOf[Char]
        }
       _set.subSet(prefix, inc(prefix))
      }
    }
    

    scala也是如此 jcl.TreeSet#range

        2
  •  14
  •   Ken Bloom    15 年前

    使用Trie。尽管有些人发布了被错误命名为tries的排序树映射数据结构,但实际上还没有人在这里发布Trie。 Here 是Java中Trie实现的一个相当有代表性的示例。我不知道斯卡拉有什么。另请参见对尝试的解释 Wikipedia .

        3
  •  3
  •   Ken Bloom    10 年前

    据我所知,Scala树集是由Java树集支持的,但是使用Scala变体将允许您使用序列理解来缩短代码( http://www.scala-lang.org/node/111 )为您提供一个类似(针对Scala 2.7)的实现:

    import scala.collection.jcl.TreeSet;
    
    class PrefixMatcher 
    {
        private val _set = new TreeSet[String]
    
        def add(s: String) = _set.add(s)
    
        def findMatches(prefix: String): Iterable[String] =
            for (s <- _set.from(prefix) if s.startsWith(prefix)) yield s
    }
    
    object Main
    {
        def main(args: Array[String]): Unit =
        {
            val pm = new PrefixMatcher()
    
            pm.add("fooBar")
            pm.add("fooCow")
            pm.add("barFoo")
    
            pm.findMatches("foo").foreach(println)
        }
    }
    

    我为自己糟糕的Scala风格道歉,我只是习惯了这种语言。

        4
  •  2
  •   Daniel C. Sobral    15 年前

    blogged 关于查找前段时间前缀组合的匹配项。这是一个更难的问题,因为你不知道一个前缀何时结束,另一个前缀何时开始。你可能会感兴趣。我甚至会在下面发布我没有在博客上写过的代码(希望是:),尽管它删除了所有评论,没有一条是用英语写的:

    package com.blogspot.dcsobral.matcher.DFA
    
    object DFA {
      type Matched = List[(String, String)]
      def words(s : String) = s.split("\\W").filter(! _.isEmpty).toList
    }
    
    import DFA._
    import scala.runtime.RichString
    
    class DFA {
      private val initialState : State = new State(None, "")
      private var currState : State = initialState
      private var _input : RichString = ""
      private var _badInput : RichString = ""
      private var _accepted : Boolean = true
    
      def accepted : Boolean = _accepted
      def input : String = _input.reverse + _badInput.reverse 
    
      def transition(c : Char) : List[(String, Matched)] = {
        if (c == '\b') backtrack
        else {
          if (accepted) {
            val newState = currState(c)
            newState match {
              case Some(s) => _input = c + _input; currState = s
              case None => _badInput = c + _badInput; _accepted = false
            }
          } else {
            _badInput = c + _badInput
          }
          optionList
        }
      }
    
      def transition(s : String) : List[(String, Matched)] = {
        s foreach (c => transition(c))
        optionList
      }
    
      def apply(c : Char) : List[(String, Matched)] = transition(c)
      def apply(s : String) : List[(String,Matched)] = transition(s)
    
      def backtrack : List[(String, Matched)] = {
        if(_badInput isEmpty) {
          _input = _input drop 1
          currState.backtrack match {
            case Some(s) => currState = s
            case None =>
          }
        } else {
          _badInput = _badInput drop 1
          if (_badInput isEmpty) _accepted = true
        }
        optionList
      }
    
      def optionList : List[(String, Matched)] = if (accepted) currState.optionList else Nil
    
      def possibleTransitions : Set[Char] = if (accepted) (currState possibleTransitions) else Set.empty
    
      def reset : Unit = {
        currState = initialState
        _input = ""
        _badInput = ""
        _accepted = true
      }
    
      def addOption(s : String) : Unit = {
        initialState addOption s
        val saveInput = input
        reset
        transition(saveInput)
      }
      def removeOption(s : String) : Unit = {
        initialState removeOption s
        val saveInput = input
        reset
        transition(saveInput)
      }
    }
    
    class State (val backtrack : Option[State],
                 val input : String) {
      private var _options : List[PossibleMatch] = Nil
      private val transitions : scala.collection.mutable.Map[Char, State] = scala.collection.mutable.Map.empty
      private var _possibleTransitions : Set[Char] = Set.empty
    
      private def computePossibleTransitions = {
        if (! options.isEmpty)
          _possibleTransitions = options map (_.possibleTransitions) reduceLeft (_++_)
        else
          _possibleTransitions = Set.empty
      }
    
      private def computeTransition(c : Char) : State = {
        val newState = new State(Some(this), input + c)
        options foreach (o => if (o.possibleTransitions contains c) (o computeTransition (newState, c)))
        newState
      }
    
      def options : List[PossibleMatch] = _options
      def optionList : List[(String, Matched)] = options map (pm => (pm.option, pm.bestMatch))
      def possibleTransitions : Set[Char] = _possibleTransitions
    
      def transition(c : Char) : Option[State] = {
        val t = c.toLowerCase
        if (possibleTransitions contains t) Some(transitions getOrElseUpdate (t, computeTransition(t))) else None
      }
    
      def apply(c : Char) : Option[State] = transition(c)
    
      def addOption(option : String) : Unit = {
        val w = words(option)
        addOption(option, w.size, List(("", w.head)), w)
      }
    
      def addOption(option : String, priority : Int, matched : Matched, remaining : List[String]) : Unit = {
        options find (_.option == option) match {
          case Some(pM) => 
            if (!pM.hasMatchOption(matched)) {
              pM.addMatchOption(priority, matched, remaining)
              if (priority < pM.priority) {
                val (before, _ :: after) = options span (_ != pM)
                val (highPriority, lowPriority) = before span (p => p.priority < priority || 
                  (p.priority == priority && p.option < option))
                _options = highPriority ::: (pM :: lowPriority) ::: after
              }
              transitions foreach (t => pM computeTransition (t._2, t._1))
            }
          case None => 
            val (highPriority, lowPriority) = options span (p => p.priority < priority || 
                  (p.priority == priority && p.option < option))
            val newPM = new PossibleMatch(option, priority, matched, remaining)
            _options = highPriority ::: (newPM :: lowPriority)
            transitions foreach (t => newPM computeTransition (t._2, t._1))
        }
        computePossibleTransitions
      }
    
      def removeOption(option : String) : Unit = {
        options find (_.option == option) match {
          case Some(possibleMatch) =>
            val (before, _ :: after) = options span (_ != possibleMatch)
            (possibleMatch.possibleTransitions ** Set(transitions.keys.toList : _*)).toList foreach (t => 
              transition(t).get.removeOption(option))
            _options = before ::: after
            computePossibleTransitions
          case None =>
        }
      }
    }
    
    class PossibleMatch (val option : String, 
                         thisPriority : Int, 
                         matched : Matched, 
                         remaining : List[String]) {
      private var _priority = thisPriority
      private var matchOptions = List(new MatchOption(priority, matched, remaining))
      private var _possibleTransitions = matchOptions map (_.possibleTransitions) reduceLeft (_++_)
      private def computePossibleTransitions = {
        _possibleTransitions = matchOptions map (_.possibleTransitions) reduceLeft (_++_)
      }
    
      def priority : Int = _priority
      def hasMatchOption(matched : Matched) : Boolean = matchOptions exists (_.matched == matched)
      def addMatchOption(priority : Int, matched : Matched, remaining : List[String]) : Unit = {
        if (priority < _priority) _priority = priority
        val (highPriority, lowPriority) = matchOptions span (_.priority < priority)
        val newMO = new MatchOption(priority, matched, remaining)
        matchOptions = highPriority ::: (newMO :: lowPriority)
        computePossibleTransitions
      }
      def bestMatch : Matched = matchOptions.head.matched.reverse.map(p => (p._1.reverse.toString, p._2)) ::: 
        remaining.tail.map(w => ("", w))
      def possibleTransitions : Set[Char] = _possibleTransitions
    
      def computeTransition(s: State, c : Char) : Unit = {
        def computeOptions(state : State,
                           c : Char, 
                           priority : Int, 
                           matched : Matched, 
                           remaining : List[String]) : Unit = {
          remaining match {
            case w :: ws => 
              if (!w.isEmpty && w(0).toLowerCase == c.toLowerCase) {
                val newMatched = (w(0) + matched.head._1, matched.head._2.substring(1)) :: matched.tail
                val newPriority = if (matched.head._1 isEmpty) (priority - 1) else priority
    
                if (w.drop(1) isEmpty)
                  s.addOption(option, newPriority - 1, ("", ws.head) :: newMatched , ws)
                else
                  s.addOption(option, newPriority, newMatched, w.substring(1) :: ws)
              }
              if (ws != Nil) computeOptions(s, c, priority, ("", ws.head) :: matched, ws)
            case Nil =>
          }
        }
    
        if(possibleTransitions contains c)
          matchOptions foreach (mO => computeOptions(s, c, mO.priority, mO.matched, mO.remaining))
      }
    }
    
    class MatchOption (val priority : Int,
                       val matched : Matched,
                       val remaining : List[String]) {
      lazy val possibleTransitions : Set[Char] = Set( remaining map (_(0) toLowerCase) : _* )
    }
    

        5
  •  2
  •   Daniel C. Sobral    15 年前

    好的,我刚刚意识到你想要的和我的一个朋友对另一个问题的建议差不多。所以,这是他的答案,根据你的需要简化了。

    class PrefixMatcher {
      // import scala.collection.Set // Scala 2.7 needs this -- and returns a gimped Set
      private var set = new scala.collection.immutable.TreeSet[String]()
      private def succ(s : String) = s.take(s.length - 1) + ((s.charAt(s.length - 1) + 1)).toChar
    
      def add(s: String) = set += s
    
      def findMatches(prefix: String): Set[String] = 
        if (prefix.isEmpty) set else set.range(prefix, succ(prefix))
    }