代码之家  ›  专栏  ›  技术社区  ›  shayan

内存中列表的完全外部联接

  •  -2
  • shayan  · 技术社区  · 5 年前

    如何在具有以下签名的列表上编写类似SQL的完全连接操作?

    fullJoin :: [a] -> [b] -> (a -> b -> Bool) -> [(Maybe a, Maybe b)]
    fullJoin xs [] _ = map (\x -> (Just x, Nothing)) xs
    fullJoin [] ys _ = map (\y -> (Nothing, Just y)) ys
    fullJoin xs@(xh:xt) ys@(yh:yt) on = 
      if on xh yh
        then (Just xh, Just yh) : fullJoin xt yt
        else ???
    

    书面条件由 a -> b -> Bool (\n i -> length n == i) ,表示来自 names 它们的长度和 nums

    例子:

    names = ["Foo","Bar", "John" , "Emily", "Connor"]
    nums = [1,2,3,4,5]
    
    fullJoin names nums (\n i -> length n == i)
      == [ (Just "Foo", Just 3)
         , (Just "Bar", Just 3)
         , (Just "John", Just 4)
         , (Just "Emily", Just 5)
         , (Just "Connor", Nothing)
         , (Nothing, Just 1)
         , (Nothing, Just 2)
         ]
    

    为了澄清上述函数的SQL等价物,下面是如何在PostgreSQL中编写它的:

    create table names as
    select * from (values 
                   ('Foo'),
                   ('Bar'),
                   ('John'),
                   ('Emily'),
                   ('Connor')
                  ) 
                  as z(name);
    
    create table nums as
    select * from (values 
                   (1),
                   (2),
                   (3),
                   (4),
                   (5)
                  ) 
                  as z(num);
    
    select * from names
    full join nums
    on char_length(name) = num
    

    name    num
    (null)  1
    (null)  2
    Foo     3
    Bar     3
    John    4
    Emily   5
    Connor  (null)
    

    小提琴链接: sqlfiddle

    0 回复  |  直到 5 年前
        1
  •  2
  •   Will Ness Derri Leahy    5 年前

    只是无聊地实现你想要它做的事情。超低效率:

    fullJoin :: [a] -> [b] -> (a -> b -> Bool) 
             -> [(Maybe a, Maybe b)]
    fullJoin xs ys p = concatMap (map (Just *** Just) . snd) a
                        ++ [(Just x, Nothing) | (x, []) <- a]
                        ++ [(Nothing, Just y) | (y, []) <- b]
      where
        a = [(x, [(x, y) | y <- ys, p x y]) | x <- xs]
        b = [(y, [(x, y) | x <- xs, p x y]) | y <- ys]
    

    如果我们被允许添加 Eq a b 类型,我们可以用 Data.List.\\ 找到不匹配的元素而不是进行第二次扫描。

    测试:

    > mapM_ print $ fullJoin names nums (\n i -> length n == i)
    (Just "Foo",Just 3)
    (Just "Bar",Just 3)
    (Just "John",Just 4)
    (Just "Emily",Just 5)
    (Just "Connor",Nothing)
    (Nothing,Just 1)
    (Nothing,Just 2)
    
        2
  •  2
  •   Fried Brice    5 年前

    表之间的完全外部联接 X Y 有条件的 rel(x, y) 可以认为是三个部分(不相交)的结合:

    • (x, y) 哪里 rel(x,y)
    • 一对 (x0, null) y in Y , not (rel(x0, y))
    • 一对 (null, y0) x in X , not (rel(x, y0))

    我们可以用类似的方式构造Haskell程序:

    fullJoin :: [a] -> [b] -> (a -> b -> Bool) -> [(Maybe a, Maybe b)]
    fullJoin xs ys rel = concat $
      [[(Just x, Just y) | y <- ys, x `rel` y] | x <- xs] -- for each x, find all the related ys
      <> [[(Just x, Nothing)] | x <- xs, all (\y -> not (x `rel` y)) ys] -- find all xs that have no related ys
      <> [[(Nothing, Just y)] | y <- ys, all (\x -> not (x `rel` y)) xs] -- find all ys that have no related xs
    

    问题的提出,你不可能比 O(n^2) . 也就是说,我的解决方案 O(n^2) xs 两次。你可以想一想你必须做些什么,这样你才能 一次。这与找到一种方法来跟踪 xs型

        3
  •  1
  •   btilly    5 年前

    这是一个简单的Python实现。它是 O(n^2)

    #! /usr/bin/env python
    
    def full_join_cond (list1, list2, condition_fn):
        answer = []
        for x in list1:
            matches = []
            for y in list2:
                if condition_fn(x, y):
                    matches.append(y)
            if len(matches):
                answer.extend([[x, y] for y in matches])
            else:
                answer.append([x, None])
    
        for y in list2:
            matched = False
            for x in list1:
                if condition_fn(x, y):
                    matched = True
                    break
            if not matched:
                answer.append([None, y])
    
        return answer
    
    
    names = ["Foo","Bar", "John" , "Emily", "Connor"]
    nums = [1,2,3,4,5]
    cond = lambda x, y: len(x) == y
    
    print(full_join_cond(names, nums, cond))
    

    这里有一个实现,它更接近于数据库执行这个连接的方式。它是 O(size_of_output) 通常是 O(n)

    def list_map_to_dict (l, m):
        d = {}
        for x in l:
            mx = m(x)
            if mx in d:
                d[mx].append(x)
            else:
                d[mx] = [x]
        return d
    
    def full_join_map (list1, map1, list2, map2):
        dict1 = list_map_to_dict(list1, map1)
        dict2 = list_map_to_dict(list2, map2)
    
        answer = []
        for mx, xs in dict1.iteritems():
            if mx in dict2:
                for y in dict2[mx]:
                    answer.extend([[x, y] for x in xs])
                dict2.pop(mx)
            else:
                answer.extend([[x, None] for x in xs])
    
        for my, ys in dict2.iteritems():
            answer.extend([[None, y] for y in ys])
        return answer
    
    print(full_join_map(names, lambda x: len(x), nums, lambda y: y))
    

    对于咯咯笑和咧嘴笑,两者可以组合成一个既通用又能快速运行的函数。(我并没有试图使函数签名合理化——只是试图展示这个想法。)

    def full_join (list1, map1, list2, map2, cond=None):
        if map1 is None:
            map1 = lambda x: None
    
        if map2 is None:
            map2 = lambda y: None
    
        if cond is None:
            cond = lambda x, y: True
    
        dict1 = list_map_to_dict(list1, map1)
        dict2 = list_map_to_dict(list2, map2)
    
        answer = []
        for mx, xs in dict1.iteritems():
            if mx in dict2:
                answer.extend(full_join_cond(xs, dict2[mx], cond))
                dict2.pop(mx)
            else:
                answer.extend([[x, None] for x in xs])
    
        for my, ys in dict2.iteritems():
            answer.extend([[None, y] for y in ys])
        return answer