代码之家  ›  专栏  ›  技术社区  ›  Scott Chamberlain

如何对“row_number()over(partition by[Col]order by[Col][按[Col]分区)”

  •  9
  • Scott Chamberlain  · 技术社区  · 12 年前

    我正在尝试将数据表中重复的条目组合起来,并给它们一个新的编号。

    以下是一个示例数据集( runnable copy )

    declare @tmpTable table
        (ID Varchar(1), 
         First varchar(4), 
         Last varchar(5), 
         Phone varchar(13),
         NonKeyField varchar(4))
    
    insert into @tmpTable select 'A', 'John', 'Smith', '(555)555-1234', 'ASDF'
    insert into @tmpTable select 'B', 'John', 'Smith', '(555)555-1234', 'GHJK'
    insert into @tmpTable select 'C', 'Jane', 'Smith', '(555)555-1234', 'QWER'
    insert into @tmpTable select 'D', 'John', 'Smith', '(555)555-1234', 'RTYU'
    insert into @tmpTable select 'E', 'Bill', 'Blake', '(555)555-0000', 'BVNM'
    insert into @tmpTable select 'F', 'Bill', 'Blake', '(555)555-0000', '%^&*'
    insert into @tmpTable select 'G', 'John', 'Smith', '(555)555-1234', '!#RF'
    
    select row_number() over (partition by First, Last, Phone order by ID) NewIDNum, *  
    from @tmpTable order by ID
    

    现在它给了我结果

    NewIDNum             ID   First Last  Phone         NonKeyField
    -------------------- ---- ----- ----- ------------- -----------
    1                    A    John  Smith (555)555-1234 ASDF
    2                    B    John  Smith (555)555-1234 GHJK
    1                    C    Jane  Smith (555)555-1234 QWER
    3                    D    John  Smith (555)555-1234 RTYU
    1                    E    Bill  Blake (555)555-0000 BVNM
    2                    F    Bill  Blake (555)555-0000 %^&*
    4                    G    John  Smith (555)555-1234 !#RF
    

    然而,这与我想要的相反 NewIDNum 每当它找到一个新的密钥组合时,就会重置它的计数器。我希望所有相同的组合都有相同的ID。因此,如果它按照我想要的方式运行,我会得到以下结果

    NewIDNum             ID   First Last  Phone         NonKeyField
    -------------------- ---- ----- ----- ------------- -----------
    1                    A    John  Smith (555)555-1234 ASDF
    1                    B    John  Smith (555)555-1234 GHJK
    2                    C    Jane  Smith (555)555-1234 QWER
    1                    D    John  Smith (555)555-1234 RTYU
    3                    E    Bill  Blake (555)555-0000 BVNM
    3                    F    Bill  Blake (555)555-0000 %^&*
    1                    G    John  Smith (555)555-1234 !#RF
    

    得到我想要的结果的正确方法是什么?


    我在最初的帖子中没有包括这一要求 :我需要 新IDNum 如果添加了更多的行,则在后续运行该查询时为现有行生成相同的数字(假设如果对ID列执行了排序,则所有新行都将具有更高的ID“值”)

    因此,如果在后一天完成以下操作

    insert into @tmpTable select 'H', 'John', 'Smith', '(555)555-1234', '4321'
    insert into @tmpTable select 'I', 'Jake', 'Jons', '(555)555-1234', '1234'
    insert into @tmpTable select 'J', 'John', 'Smith', '(555)555-1234', '2345'
    

    再次运行正确的查询会给出

    NewIDNum             ID   First Last  Phone         NonKeyField
    -------------------- ---- ----- ----- ------------- -----------
    1                    A    John  Smith (555)555-1234 ASDF
    1                    B    John  Smith (555)555-1234 GHJK
    2                    C    Jane  Smith (555)555-1234 QWER
    1                    D    John  Smith (555)555-1234 RTYU
    3                    E    Bill  Blake (555)555-0000 BVNM
    3                    F    Bill  Blake (555)555-0000 %^&*
    1                    G    John  Smith (555)555-1234 !#RF
    1                    H    John  Smith (555)555-1234 4321
    4                    I    Jake  Jons  (555)555-1234 1234
    1                    J    John  Smith (555)555-1234 2345
    
    4 回复  |  直到 12 年前
        1
  •  8
  •   Andomar    12 年前

    你可以使用 dense_rank() 以下为:

    dense_rank() over (order by First, Last, Phone) as NewIDNum
    

    作为对你评论的回应,你可以对旧的 Id 具有相同行的每组列 (First, Last, Phone) 组合:

    select  *
    from    (
            select  dense_rank() over (order by min_id) as new_id
            ,       *
            from    (
                    select  min(id) over (
                                partition by First, Last, Phone) as min_id
                    ,       *
                    from    @tmpTable 
                    ) as sub1
            ) as sub3
    order by
            new_id
    
        2
  •  1
  •   etliens    12 年前

    基于@Andomar的原始答案——这将适用于您更新的需求(尽管这不太可能很好地扩展)

    select
        DENSE_RANK() over (ORDER BY IdRank, First, Last, Phone) AS NewIDNum,
        ID,
        First,
        Last,
        Phone,
        NonKeyField
    from
    (
        select
            MIN(ID) OVER (PARTITION BY First, Last, Phone) as IdRank,
            *
        from
            @tmpTable
    ) as x
    order by
        ID;
    
        3
  •  0
  •   Community CDub    8 年前

    幸亏 Andomar's answer 作为起点,我自己解决了

    select sub1.rn, tt.*
    from @tmpTable tt
    inner join (
        select row_number() over (order by min(ID)) as rn, first, last, phone
        from @tmpTable
        group by first, last, phone
        ) as sub1 on tt.first = sub1.first and tt.last = sub1.last and tt.phone = sub1.phone
    

    这会产生

    rn                   ID   First Last  Phone         NonKeyField
    -------------------- ---- ----- ----- ------------- -----------
    1                    A    John  Smith (555)555-1234 ASDF
    1                    B    John  Smith (555)555-1234 GHJK
    1                    D    John  Smith (555)555-1234 RTYU
    1                    G    John  Smith (555)555-1234 !#RF
    1                    H    John  Smith (555)555-1234 4321
    1                    J    John  Smith (555)555-1234 2345
    2                    C    Jane  Smith (555)555-1234 QWER
    3                    E    Bill  Blake (555)555-0000 BVNM
    3                    F    Bill  Blake (555)555-0000 %^&*
    4                    I    Jake  Jons  (555)555-1234 1234
    

    从SQL执行计划来看,Adnomar的答案对于更大的数据集会比我的答案运行得更快。(53%的执行时间VS相邻运行时的47%执行时间,并选中“包括实际执行计划”。

        4
  •  -1
  •   iruvar    12 年前

    这应该有效

    select dense_rank() over (order by First, Last, Phone) NewIDNum, *  
    from @tmpTable order by ID