代码之家  ›  专栏  ›  技术社区  ›  Cenderze

从单个串联值字符串中查找拆分为行的最低值和最高值

  •  3
  • Cenderze  · 技术社区  · 8 年前

    这是我问题的后续 here :在这篇文章中,我对 乌兹 . 然而,我注意到一家新公司, Company3 还使用了单个数据点,例如account 6000,它不遵循以前公司的方式,这使得uzi的递归cte不适用。

    因此,我觉得有必要修改这个问题,但我相信,由于这个问题对解决方案的影响很大,这种复杂化会产生一个新问题,而不是对我之前的问题进行编辑。

    我需要从Excel工作簿中读取数据,其中数据以以下方式存储:

    Company       Accounts
    Company1      (#3000...#3999)
    Company2      (#4000..#4019)+(#4021..#4024)
    Company3      (#5000..#5001)+#6000+(#6005..#6010)
    

    我认为,由于一些公司 公司3 具有单一账户价值,如 #6000 在此步骤中,我需要创建以下外观的结果集:

    Company       FirstAcc LastAcc
    Company1      3000     3999
    Company2      4000     4019
    Company2      4021     4024
    Company3      5000     5001
    Company3      6000     NULL
    Company3      6005     6010
    

    然后,我将使用此表并将其与仅包含整数的表联接,以获得最终表的外观,例如我链接的问题中的表。

    有人有什么想法吗?

    4 回复  |  直到 8 年前
        1
  •  1
  •   Alan Burstein    8 年前

    一个好的t-sql拆分器函数使这变得非常简单;我建议 delimitedSplit8k . 这也将比递归CTE的性能好得多。首先是样本数据:

    -- your sample data
    if object_id('tempdb..#yourtable') is not null drop table #yourtable;
    create table #yourtable (company varchar(100), accounts varchar(8000));
    insert #yourtable values ('Company1','(#3000...#3999)'),
    ('Company2','(#4000..#4019)+(#4021..#4024)'),('Company3','(#5000..#5001)+#6000+(#6005..#6010)');
    

    以及解决方案:

    select 
      company, 
      firstAcc = max(case when split2.item not like '%)' then clean.Item end),
      lastAcc  = max(case when split2.item     like '%)' then clean.Item end)
    from #yourtable t
    cross apply dbo.delimitedSplit8K(accounts, '+') split1
    cross apply dbo.delimitedSplit8K(split1.Item, '.') split2
    cross apply (values (replace(replace(split2.Item,')',''),'(',''))) clean(item)
    where split2.item > ''
    group by split1.Item, company;
    

    结果:

    company   firstAcc   lastAcc
    --------- ---------- --------------
    Company1  #3000      #3999
    Company2  #4000      #4019
    Company2  #4021      #4024
    Company3  #6000      NULL
    Company3  #5000      #5001
    Company3  #6005      #6010
    
        2
  •  1
  •   uzi    8 年前

    我相信列表(#6005..6010)在Excel文件中的表示方式类似于#6005#6006#6007#6008#6009#6010。如果这是真的并且没有差距,请尝试此查询

    with cte as (
    select 
        company, replace(replace(replace(accounts,'(',''),')',''),'+','')+'#' accounts
    from 
        (values ('company 1','#3000#3001#3002#3003'),('company 2','(#4000#4001)+(#4021#4022)'),('company 3','(#5000#5001)+#6000+(#6005#6006)')) data(company, accounts)
    )
    
    , rcte as (
        select 
            company, stuff(accounts, ind1, ind2 - ind1, '') acc, substring(accounts, ind1 + 1, ind2 - ind1 - 1) accounts
        from 
            cte
            cross apply (select charindex('#', accounts) ind1) ca
            cross apply (select charindex('#', accounts, ind1 + 1) ind2) cb
        union all
        select
            company, stuff(acc, ind1, ind2 - ind1, ''), substring(acc, ind1 + 1, ind2 - ind1 - 1)
        from
            rcte
            cross apply (select charindex('#', acc) ind1) ca
            cross apply (select charindex('#', acc, ind1 + 1) ind2) cb
        where
            len(acc)>1
    )
    
    select
        company, min(accounts) FirstAcc, case when max(accounts)  =min(accounts) then null else max(accounts) end LastAcc
    from (
        select
            company, accounts, accounts - row_number() over (partition by company order by accounts) group_
        from 
            rcte
        ) t
    group by company, group_
    
    option (maxrecursion 0)
    
        3
  •  1
  •   Hadi    8 年前

    我对另一个问题的@uzi解决方案进行了一些编辑,在其中我添加了三个其他CTE,并使用了windows功能,如 LEAD() ROW_NUMBER() 解决问题。我不知道是否有更简单的解决方案,但我认为这很有效。

    with cte as (
    select 
        company, replace(replace(replace(accounts,'(',''),')',''),'+','')+'#' accounts 
    from 
        (values ('company 1','#3000..#3999'),('company 2','(#4000..#4019)+(#4021..#4024)'),('company 3','(#5000..#5001)+#6000+(#6005..#6010)')) data(company, accounts)
    )
    , rcte as (
        select 
            company, stuff(accounts, ind1, ind2 - ind1, '') acc, substring(accounts, ind1 + 1, ind2 - ind1 - 1) accounts
        from 
            cte
            cross apply (select charindex('#', accounts) ind1) ca
            cross apply (select charindex('#', accounts, ind1 + 1) ind2) cb
        union all
        select
            company, stuff(acc, ind1, ind2 - ind1, ''), substring(acc, ind1 + 1, ind2 - ind1 - 1)
        from
            rcte
            cross apply (select charindex('#', acc) ind1) ca
            cross apply (select charindex('#', acc, ind1 + 1) ind2) cb
        where
            len(acc)>1
    ) ,cte2 as (
    
        select company, accounts as  accounts_raw, Replace( accounts,'..','') as accounts,
            LEAD(accounts) OVER(Partition by company ORDER BY accounts) ld,
            ROW_NUMBER() OVER(ORDER BY accounts) rn 
        from rcte
    ) , cte3 as (
    
        Select company,accounts,ld ,rn 
        from cte2 
        WHERE ld not like '%..' 
    ) , cte4 as (
        select * from cte3 where accounts not in (select ld from cte3 t1 where t1.rn < cte3.rn)
    )
    
    SELECT company,accounts,ld from cte4
    UNION
    SELECT DISTINCT company,ld,NULL from cte3 where accounts not in (select accounts from cte4 t1)
    
    option (maxrecursion 0)
    

    结果:

    enter image description here

        4
  •  1
  •   Hadi    8 年前

    看起来您标记了SSI,因此我将使用脚本任务提供解决方案。所有其他示例都需要加载到临时表。

    1. 使用普通阅读器(可能是Excel)并加载
    2. 添加脚本转换组件
    3. 编辑组件
    4. 输入列-检查公司和帐户
    5. 输入和输出-添加新的输出并将其称为CompFirstLast
    6. 向it中添加三列-公司字符串、第一个int和最后一个int
    7. 打开脚本并粘贴以下代码

      public override void Input0_ProcessInputRow(Input0Buffer Row)
      {
      
      //Create an array for each group to create rows out of by splitting on '+'
      
      string[] SplitForRows = Row.Accounts.Split('+'); //Note single quotes denoting char 
      
      //Deal with each group and create the new Output
      for (int i = 0; i < SplitForRows.Length; i++) //Loop each split column
          {
              CompFirstLastBuffer.AddRow();
              CompFirstLastBuffer.Company = Row.Company; //This is static for each incoming row
      
              //Clean up the string getting rid of (). and leaving a delimited list of #
              string accts = SplitForRows[i].Replace("(", String.Empty).Replace(")", String.Empty).Replace(".", String.Empty).Substring(1);
      
              //Split into Array
              string[] accounts = accts.Split('#');
      
              // Write out first and last and handle null
              CompFirstLastBuffer.First = int.Parse(accounts[0]);
      
              if (accounts.Length == 1)
                  CompFirstLastBuffer.Last_IsNull = true;
              else
                  CompFirstLastBuffer.Last = int.Parse(accounts[1]);
      
          }
      }
      
    8. 确保使用正确的输出。