代码之家  ›  专栏  ›  技术社区  ›  Pedro Silva

如何扩展二进制搜索迭代器以使用多个目标

  •  1
  • Pedro Silva  · 技术社区  · 15 年前

    我有一个功能, binary_range_search

    my $brs_iterator = binary_range_search(
        target => $range,                   # eg. [1, 200]
        search => $ranges                   # eg. [ {start => 1,   end => 1000},
    );                                      #       {start => 500, end => 1500} ]
    

    brs_iterator->() 将在$range重叠的所有@$range上迭代。

    我想延长 能够以多个范围为目标调用它,例如:

    target => $target_ranges # eg. [ [1, 200], [50, 300], ... ]
    search => $search_ranges # as above
    

    因此,当在$range上搜索时->[0]已用尽,它应移到$range->[1] ,等等。以下是有关函数的原始形式:

    sub binary_range_search {
        my %options = @_;
        my $range    = $options{target}  || return;
        my $ranges   = $options{search}  || return;
    
        my ( $low, $high ) = ( 0, @{$ranges} - 1 );
    
        while ( $low <= $high ) {
    
            my $try = int( ( $low + $high ) / 2 );
    
            $low  = $try + 1, next if $ranges->[$try]{end}   < $range->[0];
            $high = $try - 1, next if $ranges->[$try]{start} > $range->[1];
    
            my ( $down, $up ) = ($try) x 2;
    
            my %seen = ();
    
            my $brs_iterator = sub {
    
                if (    $ranges->[ $up + 1 ]{end}       >= $range->[0]
                        and $ranges->[ $up + 1 ]{start} <= $range->[1]
                        and !exists $seen{ $up + 1 } )
                {
                    $seen{ $up + 1 } = undef;
                    return $ranges->[ ++$up ];
                }
                elsif ( $ranges->[ $down - 1 ]{end}       >= $range->[0]
                        and $ranges->[ $down + 1 ]{start} <= $range->[1]
                        and !exists $seen{ $down - 1 }
                        and $down > 0 )
                {
                    $seen{ $down - 1 } = undef;
                    return $ranges->[ --$down ];
                }
                elsif ( !exists $seen{$try} ) {
                    $seen{$try} = undef;
                  return $ranges->[$try];
                }
                else {
                    return;
                }
    
            };
            return $brs_iterator;
        }
        return sub { };
    }
    

    这是一个标准的二进制搜索策略,直到它找到一个重叠的范围。然后它向右移动,耗尽,向左移动,耗尽,最后放弃。理想情况下,它应该是这样的 shift 下一个目标范围,并重新进行搜索,我想(可能通过递归?)。我的问题是,我不知道如何使用迭代器构造来实现这一点。

    4 回复  |  直到 15 年前
        1
  •  2
  •   daotoad    15 年前

    我刚刚将迭代器生成包装在for循环中,并构建了一个迭代器函数数组。

    use strict;
    use warnings;
    
    
    my $t = [ [1,200], [400,900] ];
    my @r = (
        { start =>   1, end =>  100 },
        { start =>   2, end =>  500 },
        { start => 204, end =>  500 },
        { start => 208, end =>  500 },
        { start => 215, end => 1000 },
        { start => 150, end => 1000 },
        { start => 500, end => 1100 },
    );
    
    # Get a master iterator that will process each iterator in turn.
    my $brs_iterator = binary_range_search(
        targets => $t,  
        search => \@r,
    );
    
    # Get an array of iterators
    my @brs_iterator = binary_range_search(
        targets => $t,  
        search => \@r,
    );
    
    
    
    sub binary_range_search {
        my %options = @_;
        my $targets = $options{targets}  || return;
        my $ranges  = $options{search}  || return;
    
    
        my @iterators;
    
        TARGET:
        for my $target ( @$targets ) {
    
            my ( $low, $high ) = ( 0, $#{$ranges} );
    
            RANGE_CHECK:
            while ( $low <= $high ) {
    
                my $try = int( ( $low + $high ) / 2 );
    
                # Remove non-overlapping ranges
                $low  = $try + 1, next RANGE_CHECK 
                    if $ranges->[$try]{end}   < $target->[0];
    
                $high = $try - 1, next RANGE_CHECK 
                    if $ranges->[$try]{start} > $target->[1];
    
                my ( $down, $up ) = ($try) x 2;
    
                my %seen = ();
    
                my $brs_iterator = sub {
    
                    if (    exists $ranges->[$up + 1]
                            and $ranges->[ $up + 1 ]{end}   >= $target->[0]
                            and $ranges->[ $up + 1 ]{start} <= $target->[1]
                            and !exists $seen{ $up + 1 } )
                    {
                        $seen{ $up + 1 } = undef;
                        return $ranges->[ ++$up ];
                    }
                    elsif ( $ranges->[ $down - 1 ]{end}       >= $target->[0]
                            and $ranges->[ $down + 1 ]{start} <= $target->[1]
                            and !exists $seen{ $down - 1 }
                            and $down > 0 )
                    {
                        $seen{ $down - 1 } = undef;
                        return $ranges->[ --$down ];
                    }
                    elsif ( !exists $seen{$try} ) {
                        $seen{$try} = undef;
                      return $ranges->[$try];
                    }
                    else {
                        return;
                    }
    
                };
                push @iterators, $brs_iterator;
                next TARGET;
            }
    
        }
    
        # In scalar context return master iterator that iterates over the list of range iterators.
        # In list context returns a list of range iterators.
        return wantarray 
             ? @iterators 
             : sub { 
                 while( @iterators ) {
                     if( my $range = $iterators[0]() ) {
                         return $range;
                     }
                     shift @iterators;
                 }
                 return;
            }; 
    }
    
        2
  •  2
  •   Greg Bacon    15 年前

    如果要迭代所有与搜索范围重叠的值,则不需要二进制搜索。

    use warnings;
    use strict;
    
    use Carp;
    

    首先,检查我们是否有 target search

    sub binary_range_search {
      my %arg = @_;
    
      my @errors;
      my $target = $arg{target} || push @errors => "no target";
      my $search = $arg{search} || push @errors => "no search";
    
      for (@$target) {
        my($start,$end) = @$_;
        push @errors => "Target start ($start) is greater than end ($end)"
          if $start > $end;
      }
    
      for (@$search) {
        my($start,$end) = @{$_}{qw/ start end /};
        push @errors => "Search start ($start) is greater than end ($end)"
          if $start > $end;
      }
    
      croak "Invalid use of binary_range_search:\n",
            map "  - $_\n", @errors
        if @errors;
    

    迭代器本身是一个保持以下状态的闭包:

      my $i;
      my($ta,$tb);
      my($sa,$sb);
      my $si = 0;
    

    • $i 如果已定义,则为当前重叠范围中的下一个值
    • $ta $tb
    • $sa $sb 与上述类似,但适用于当前搜索范围
    • $si @$search 并定义当前搜索范围

    $it

      my $it;
      $it = sub {
    

    如果没有更多的目标范围,或者没有搜索范围,我们就完成了。

        return unless @$target && @$search;
    

    什么时候 是定义的,这意味着我们发现了一个重叠,并通过递增进行迭代 $i

        if (defined $i) {
          # iterating within a target range
    
          if ($i > $tb || $i > $sb) {
            ++$si;
            undef $i;
            return $it->();
          }
          else {
            return $i++;
          }
        }
    

    否则,我们需要确定下一个目标范围是否与任何搜索范围重叠。然而,如果

        else {
          # does the next target range overlap?
    
          if ($si >= @$search) {
            shift @$target;
            $si = 0;
            return $it->();
          }
    

    @$target )和当前搜索范围(按索引) $si

          ($ta,$tb) = @{ $target->[0] };
          ($sa,$sb) = @{ $search->[$si] }{qw/ start end /};
    

    现在测试重叠是很简单的。对于不相交的搜索范围,我们忽略并继续。否则,我们将在重叠中找到最左边的点并从那里迭代。

          if ($sb < $ta || $sa > $tb) {
            # disjoint
            ++$si;
            undef $i;
            return $it->();
          }
          elsif ($sa >= $ta) {
            $i = $sa;
            return $i++;
          }
          elsif ($ta >= $sa) {
            $i = $ta;
            return $i++;
          }
        }
      };
    

      $it;
    }
    

    举一个类似于你问题中的例子

    my $it = binary_range_search(
      target => [ [1, 200], [50, 300] ],
      search => [ { start =>   1, end => 1000 },
                  { start => 500, end => 1500 },
                  { start =>  40, end =>   60 },
                  { start => 250, end =>  260 } ],
    );
    
    while (defined(my $value = $it->())) {
      print "got $value\n";
    }
    

    其带内部点的输出被省略

    got 1
    [...]
    got 200
    got 40
    [...]
    got 60
    got 50
    [...]
    got 300
    got 50
    [...]
    got 60
    got 250
    [...]
    got 260
        3
  •  0
  •   Marcelo Cantos    15 年前

    将其拆分为两个函数,一个外部函数在范围内循环,并调用一个内部函数,该函数实现传统的二进制切分。

        4
  •  0
  •   David Lehavi    15 年前

    警告:一个非常C++的偏颇答案:

    您需要做的是定义一种新类型的迭代器,它是一对普通迭代器和一对segment迭代器(如果您没有段迭代器,它是一对指向段的常量指针/ref,以及指向正确段的索引)。您必须定义随机访问迭代器的所有概念(差分、整数加法等)。请记住,至少在C++ LIGO中,这不是一个真正的随机迭代器,因为增加一个整数不是真正的常数时间;这就是生活。

    推荐文章