代码之家 › 专栏 › 技术社区 › Cenderze

从单个串联值字符串中查找拆分为行的最低值和最高值

etl ssis tsql excel sql-server

Cenderze · 技术社区 · 8 年前

这是我问题的后续 here :在这篇文章中,我对乌兹 . 然而,我注意到一家新公司, Company3 还使用了单个数据点,例如account 6000,它不遵循以前公司的方式,这使得uzi的递归cte不适用。

因此,我觉得有必要修改这个问题,但我相信,由于这个问题对解决方案的影响很大,这种复杂化会产生一个新问题,而不是对我之前的问题进行编辑。

我需要从Excel工作簿中读取数据,其中数据以以下方式存储:

Company       Accounts
Company1      (#3000...#3999)
Company2      (#4000..#4019)+(#4021..#4024)
Company3      (#5000..#5001)+#6000+(#6005..#6010)

我认为,由于一些公司 公司3 具有单一账户价值,如 #6000 在此步骤中,我需要创建以下外观的结果集:

Company       FirstAcc LastAcc
Company1      3000     3999
Company2      4000     4019
Company2      4021     4024
Company3      5000     5001
Company3      6000     NULL
Company3      6005     6010

然后,我将使用此表并将其与仅包含整数的表联接,以获得最终表的外观,例如我链接的问题中的表。

有人有什么想法吗?

4 回复 | 直到 8 年前

Alan Burstein 8 年前

一个好的t-sql拆分器函数使这变得非常简单;我建议 delimitedSplit8k . 这也将比递归CTE的性能好得多。首先是样本数据:

-- your sample data
if object_id('tempdb..#yourtable') is not null drop table #yourtable;
create table #yourtable (company varchar(100), accounts varchar(8000));
insert #yourtable values ('Company1','(#3000...#3999)'),
('Company2','(#4000..#4019)+(#4021..#4024)'),('Company3','(#5000..#5001)+#6000+(#6005..#6010)');

以及解决方案:

select 
  company, 
  firstAcc = max(case when split2.item not like '%)' then clean.Item end),
  lastAcc  = max(case when split2.item     like '%)' then clean.Item end)
from #yourtable t
cross apply dbo.delimitedSplit8K(accounts, '+') split1
cross apply dbo.delimitedSplit8K(split1.Item, '.') split2
cross apply (values (replace(replace(split2.Item,')',''),'(',''))) clean(item)
where split2.item > ''
group by split1.Item, company;

结果:

company   firstAcc   lastAcc
--------- ---------- --------------
Company1  #3000      #3999
Company2  #4000      #4019
Company2  #4021      #4024
Company3  #6000      NULL
Company3  #5000      #5001
Company3  #6005      #6010

uzi 8 年前

我相信列表(#6005..6010)在Excel文件中的表示方式类似于#6005#6006#6007#6008#6009#6010。如果这是真的并且没有差距,请尝试此查询

with cte as (
select 
    company, replace(replace(replace(accounts,'(',''),')',''),'+','')+'#' accounts
from 
    (values ('company 1','#3000#3001#3002#3003'),('company 2','(#4000#4001)+(#4021#4022)'),('company 3','(#5000#5001)+#6000+(#6005#6006)')) data(company, accounts)
)

, rcte as (
    select 
        company, stuff(accounts, ind1, ind2 - ind1, '') acc, substring(accounts, ind1 + 1, ind2 - ind1 - 1) accounts
    from 
        cte
        cross apply (select charindex('#', accounts) ind1) ca
        cross apply (select charindex('#', accounts, ind1 + 1) ind2) cb
    union all
    select
        company, stuff(acc, ind1, ind2 - ind1, ''), substring(acc, ind1 + 1, ind2 - ind1 - 1)
    from
        rcte
        cross apply (select charindex('#', acc) ind1) ca
        cross apply (select charindex('#', acc, ind1 + 1) ind2) cb
    where
        len(acc)>1
)

select
    company, min(accounts) FirstAcc, case when max(accounts)  =min(accounts) then null else max(accounts) end LastAcc
from (
    select
        company, accounts, accounts - row_number() over (partition by company order by accounts) group_
    from 
        rcte
    ) t
group by company, group_

option (maxrecursion 0)

Hadi 8 年前

我对另一个问题的@uzi解决方案进行了一些编辑,在其中我添加了三个其他CTE,并使用了windows功能,如 LEAD() 和 ROW_NUMBER() 解决问题。我不知道是否有更简单的解决方案,但我认为这很有效。

with cte as (
select 
    company, replace(replace(replace(accounts,'(',''),')',''),'+','')+'#' accounts 
from 
    (values ('company 1','#3000..#3999'),('company 2','(#4000..#4019)+(#4021..#4024)'),('company 3','(#5000..#5001)+#6000+(#6005..#6010)')) data(company, accounts)
)
, rcte as (
    select 
        company, stuff(accounts, ind1, ind2 - ind1, '') acc, substring(accounts, ind1 + 1, ind2 - ind1 - 1) accounts
    from 
        cte
        cross apply (select charindex('#', accounts) ind1) ca
        cross apply (select charindex('#', accounts, ind1 + 1) ind2) cb
    union all
    select
        company, stuff(acc, ind1, ind2 - ind1, ''), substring(acc, ind1 + 1, ind2 - ind1 - 1)
    from
        rcte
        cross apply (select charindex('#', acc) ind1) ca
        cross apply (select charindex('#', acc, ind1 + 1) ind2) cb
    where
        len(acc)>1
) ,cte2 as (

    select company, accounts as  accounts_raw, Replace( accounts,'..','') as accounts,
        LEAD(accounts) OVER(Partition by company ORDER BY accounts) ld,
        ROW_NUMBER() OVER(ORDER BY accounts) rn 
    from rcte
) , cte3 as (

    Select company,accounts,ld ,rn 
    from cte2 
    WHERE ld not like '%..' 
) , cte4 as (
    select * from cte3 where accounts not in (select ld from cte3 t1 where t1.rn < cte3.rn)
)

SELECT company,accounts,ld from cte4
UNION
SELECT DISTINCT company,ld,NULL from cte3 where accounts not in (select accounts from cte4 t1)

option (maxrecursion 0)

结果:

Hadi 8 年前

看起来您标记了SSI,因此我将使用脚本任务提供解决方案。所有其他示例都需要加载到临时表。

使用普通阅读器(可能是Excel)并加载
添加脚本转换组件
编辑组件
输入列-检查公司和帐户
输入和输出-添加新的输出并将其称为CompFirstLast
向it中添加三列-公司字符串、第一个int和最后一个int

打开脚本并粘贴以下代码

public override void Input0_ProcessInputRow(Input0Buffer Row)
{

//Create an array for each group to create rows out of by splitting on '+'

string[] SplitForRows = Row.Accounts.Split('+'); //Note single quotes denoting char 

//Deal with each group and create the new Output
for (int i = 0; i < SplitForRows.Length; i++) //Loop each split column
    {
        CompFirstLastBuffer.AddRow();
        CompFirstLastBuffer.Company = Row.Company; //This is static for each incoming row

        //Clean up the string getting rid of (). and leaving a delimited list of #
        string accts = SplitForRows[i].Replace("(", String.Empty).Replace(")", String.Empty).Replace(".", String.Empty).Substring(1);

        //Split into Array
        string[] accounts = accts.Split('#');

        // Write out first and last and handle null
        CompFirstLastBuffer.First = int.Parse(accounts[0]);

        if (accounts.Length == 1)
            CompFirstLastBuffer.Last_IsNull = true;
        else
            CompFirstLastBuffer.Last = int.Parse(accounts[1]);

    }
}

确保使用正确的输出。