代码之家  ›  专栏  ›  技术社区  ›  Noctis Skytower

如何将记忆应用于该算法?

  •  1
  • Noctis Skytower  · 技术社区  · 15 年前

    在找到 difflib.SequenceMatcher

    会议的目的 diff 模块是计算一对序列(列表、元组、字符串、字节、字节数组等)之间的差异和相似性。最初的版本比现在的版本慢得多,速度提高了10倍。如何将记忆应用于以下代码?重写算法以进一步提高速度的最佳方法是什么?


    class Slice:
    
        __slots__ = 'prefix', 'root', 'suffix'
    
        def __init__(self, prefix, root, suffix):
            self.prefix = prefix
            self.root = root
            self.suffix = suffix
    
    ################################################################################
    
    class Match:
    
        __slots__ = 'a', 'b', 'prefix', 'suffix', 'value'
    
        def __init__(self, a, b, prefix, suffix, value):
            self.a = a
            self.b = b
            self.prefix = prefix
            self.suffix = suffix
            self.value = value
    
    ################################################################################
    
    class Tree:
    
        __slots__ = 'nodes', 'index', 'value'
    
        def __init__(self, nodes, index, value):
            self.nodes = nodes
            self.index = index
            self.value = value
    
    ################################################################################
    
    def search(a, b):
        # Initialize startup variables.
        nodes, index = [], []
        a_size, b_size = len(a), len(b)
        # Begin to slice the sequences.
        for size in range(min(a_size, b_size), 0, -1):
            for a_addr in range(a_size - size + 1):
                # Slice "a" at address and end.
                a_term = a_addr + size
                a_root = a[a_addr:a_term]
                for b_addr in range(b_size - size + 1):
                    # Slice "b" at address and end.
                    b_term = b_addr + size
                    b_root = b[b_addr:b_term]
                    # Find out if slices are equal.
                    if a_root == b_root:
                        # Create prefix tree to search.
                        a_pref, b_pref = a[:a_addr], b[:b_addr]
                        p_tree = search(a_pref, b_pref)
                        # Create suffix tree to search.
                        a_suff, b_suff = a[a_term:], b[b_term:]
                        s_tree = search(a_suff, b_suff)
                        # Make completed slice objects.
                        a_slic = Slice(a_pref, a_root, a_suff)
                        b_slic = Slice(b_pref, b_root, b_suff)
                        # Finish the match calculation.
                        value = size + p_tree.value + s_tree.value
                        match = Match(a_slic, b_slic, p_tree, s_tree, value)
                        # Append results to tree lists.
                        nodes.append(match)
                        index.append(value)
            # Return largest matches found.
            if nodes:
                return Tree(nodes, index, max(index))
        # Give caller null tree object.
        return Tree(nodes, index, 0)
    

    参考文献: How to optimize a recursive algorithm to not repeat itself?

    2 回复  |  直到 8 年前
        1
  •  1
  •   kzh    15 年前

    正如~unutbu所说,试试 memoized decorator 以及以下变化:

    @memoized
    def search(a, b):
        # Initialize startup variables.
        nodes, index = [], []
        a_size, b_size = len(a), len(b)
        # Begin to slice the sequences.
        for size in range(min(a_size, b_size), 0, -1):
            for a_addr in range(a_size - size + 1):
                # Slice "a" at address and end.
                a_term = a_addr + size
                a_root = list(a)[a_addr:a_term] #change to list
                for b_addr in range(b_size - size + 1):
                    # Slice "b" at address and end.
                    b_term = b_addr + size
                    b_root = list(b)[b_addr:b_term] #change to list
                    # Find out if slices are equal.
                    if a_root == b_root:
                        # Create prefix tree to search.
                        a_pref, b_pref = list(a)[:a_addr], list(b)[:b_addr]
                        p_tree = search(a_pref, b_pref)
                        # Create suffix tree to search.
                        a_suff, b_suff = list(a)[a_term:], list(b)[b_term:]
                        s_tree = search(a_suff, b_suff)
                        # Make completed slice objects.
                        a_slic = Slice(a_pref, a_root, a_suff)
                        b_slic = Slice(b_pref, b_root, b_suff)
                        # Finish the match calculation.
                        value = size + p_tree.value + s_tree.value
                        match = Match(a_slic, b_slic, p_tree, s_tree, value)
                        # Append results to tree lists.
                        nodes.append(match)
                        index.append(value)
            # Return largest matches found.
            if nodes:
                return Tree(nodes, index, max(index))
        # Give caller null tree object.
        return Tree(nodes, index, 0)
    

    对于回忆录,字典是最好的,但它们不能被分割,所以它们必须被更改为列表,如上面的注释所示。

        2
  •  1
  •   Community Mohan Dere    8 年前

    你可以使用 Python Decorator Library 像这样使用:

    @memoized
    def search(a, b):
    

    你第一次打电话 search 带参数 a,b 搜索 使用相同的参数调用,结果从缓存返回。

    memoized a b 是数字的元组,那么它们是可散列的。如果它们是列表,那么您可以在将它们传递给之前将它们转换为元组 搜索 . 搜索 dicts they would not be hashable 而记忆化装饰器将无法将结果保存在缓存中。

        3
  •  0
  •   Noctis Skytower    6 年前

    自从提出这个问题已经9年多了,但是内部缓存结果以加速算法的概念今天终于应用到代码中。应用结果如下:

    #! /usr/bin/env python3
    """Compute differences and similarities between a pair of sequences.
    
    After finding the "difflib.SequenceMatcher" class unsuitable, this module
    was written and re-written several times into the polished version below."""
    
    __author__ = 'Stephen "Zero" Chappell <Noctis.Skytower@gmail.com>'
    __date__ = '3 September 2019'
    __version__ = '$Revision: 4 $'
    
    
    class Slice:
        __slots__ = 'prefix', 'root', 'suffix'
    
        def __init__(self, prefix, root, suffix):
            self.prefix = prefix
            self.root = root
            self.suffix = suffix
    
    
    class Match:
        __slots__ = 'a', 'b', 'prefix', 'suffix', 'value'
    
        def __init__(self, a, b, prefix, suffix, value):
            self.a = a
            self.b = b
            self.prefix = prefix
            self.suffix = suffix
            self.value = value
    
    
    class Tree:
        __slots__ = 'nodes', 'index', 'value'
    
        def __init__(self, nodes, index, value):
            self.nodes = nodes
            self.index = index
            self.value = value
    
    
    def search(a, b):
        return _search(a, b, {})
    
    
    def _search(a, b, memo):
        # Initialize startup variables.
        nodes, index = [], []
        a_size, b_size = len(a), len(b)
        # Begin to slice the sequences.
        for size in range(min(a_size, b_size), 0, -1):
            for a_addr in range(a_size - size + 1):
                # Slice "a" at address and end.
                a_term = a_addr + size
                a_root = a[a_addr:a_term]
                for b_addr in range(b_size - size + 1):
                    # Slice "b" at address and end.
                    b_term = b_addr + size
                    b_root = b[b_addr:b_term]
                    # Find out if slices are equal.
                    if a_root == b_root:
                        # Create prefix tree to search.
                        key = a_prefix, b_prefix = a[:a_addr], b[:b_addr]
                        if key not in memo:
                            memo[key] = _search(a_prefix, b_prefix, memo)
                        p_tree = memo[key]
                        # Create suffix tree to search.
                        key = a_suffix, b_suffix = a[a_term:], b[b_term:]
                        if key not in memo:
                            memo[key] = _search(a_suffix, b_suffix, memo)
                        s_tree = memo[key]
                        # Make completed slice objects.
                        a_slice = Slice(a_prefix, a_root, a_suffix)
                        b_slice = Slice(b_prefix, b_root, b_suffix)
                        # Finish the match calculation.
                        value = size + p_tree.value + s_tree.value
                        match = Match(a_slice, b_slice, p_tree, s_tree, value)
                        # Append results to tree lists.
                        nodes.append(match)
                        index.append(value)
            # Return largest matches found.
            if nodes:
                return Tree(nodes, index, max(index))
        # Give caller null tree object.
        return Tree(nodes, index, 0)