代码之家  ›  专栏  ›  技术社区  ›  VVS

展平相交时间跨度

  •  15
  • VVS  · 技术社区  · 15 年前

    对于一个给定的ID,我有很多具有开始和停止时间的数据,我需要将所有相交和相邻的时间跨度展平为一个组合的时间跨度。下面发布的示例数据都是同一ID的,所以我没有列出它。

    为了让事情更清楚一点,请看一下2009年6月3日的样本数据:

    以下时间跨度重叠或连续,需要合并为一个时间跨度

    • 05:54:48-10:00:13
    • 09:26:45-09:59:40

    得出的时间跨度为05:54:48到10:00:13。由于10:00:13和10:12:50之间存在差距,我们也有以下时间间隔:

    • 10:12:50-10:27:25
    • 10:13:12-11:14:56
    • 10:27:25-10:27:31
    • 10:27:39-13:53:38
    • 11:14:56-11:15:03
    • 11:15:30-14:02:14
    • 13:53:38-13:53:43
    • 14:02:14-14:02:31

    这会导致一个合并的时间跨度从10:12:50到14:02:31,因为它们重叠或相邻。

    下面您将找到我需要的示例数据和扁平数据。“持续时间”列只是提供信息。

    任何解决方案——不管是不是SQL——都会受到赞赏。


    编辑 :由于有许多不同且有趣的解决方案,我正在通过添加约束来改进原始问题,以查看“最佳”(如果有)解决方案冒泡:

    • 我正在从另一个系统通过ODBC获取数据。无法为我更改表布局或添加索引
    • 数据只按日期列编制索引(时间部分不是)
    • 每天大约有2.5公里的路程
    • 数据的估计使用模式大致如下:
      • 大多数时候(假设90%)用户只查询一到两天(2.5K-5K行)
      • 有时(9%)范围最长可达一个月(~75K行)
      • 很少(1%)的范围将长达一年(约90万行)
    • 对于典型的情况,查询应该很快,而对于罕见的情况,查询不应该“永远持续”。
    • 查询一年的数据大约需要5分钟(无连接的纯选择)

    在这些限制条件下,最佳解决方案是什么?恐怕大多数解决方案都会非常缓慢,因为它们结合了日期和时间,在我的例子中这不是一个索引字段。

    您是在客户端还是在服务器端进行所有合并?您是否会首先创建一个优化的临时表,并在该表中使用一个建议的解决方案?到目前为止,我还没有时间测试这些解决方案,但我会告诉你什么对我最有效。


    样本数据:

    Date       | Start    | Stop
    -----------+----------+---------
    02.06.2009 | 05:55:28 | 09:58:27
    02.06.2009 | 10:15:19 | 13:58:24
    02.06.2009 | 13:58:24 | 13:58:43
    03.06.2009 | 05:54:48 | 10:00:13
    03.06.2009 | 09:26:45 | 09:59:40
    03.06.2009 | 10:12:50 | 10:27:25
    03.06.2009 | 10:13:12 | 11:14:56
    03.06.2009 | 10:27:25 | 10:27:31
    03.06.2009 | 10:27:39 | 13:53:38
    03.06.2009 | 11:14:56 | 11:15:03
    03.06.2009 | 11:15:30 | 14:02:14
    03.06.2009 | 13:53:38 | 13:53:43
    03.06.2009 | 14:02:14 | 14:02:31
    04.06.2009 | 05:48:27 | 09:58:59
    04.06.2009 | 06:00:00 | 09:59:07
    04.06.2009 | 10:15:52 | 13:54:52
    04.06.2009 | 10:16:01 | 13:24:20
    04.06.2009 | 13:24:20 | 13:24:24
    04.06.2009 | 13:24:32 | 14:00:39
    04.06.2009 | 13:54:52 | 13:54:58
    04.06.2009 | 14:00:39 | 14:00:49
    05.06.2009 | 05:53:58 | 09:59:12
    05.06.2009 | 10:16:05 | 13:59:08
    05.06.2009 | 13:59:08 | 13:59:16
    06.06.2009 | 06:04:00 | 10:00:00
    06.06.2009 | 10:16:54 | 10:18:40
    06.06.2009 | 10:18:40 | 10:18:45
    06.06.2009 | 10:23:00 | 13:57:00
    06.06.2009 | 10:23:48 | 13:57:54
    06.06.2009 | 13:57:21 | 13:57:38
    06.06.2009 | 13:57:54 | 13:57:58
    07.06.2009 | 21:59:30 | 01:58:49
    07.06.2009 | 22:12:16 | 01:58:39
    07.06.2009 | 22:12:25 | 01:58:28
    08.06.2009 | 02:10:33 | 05:56:11
    08.06.2009 | 02:10:43 | 05:56:23
    08.06.2009 | 02:10:49 | 05:55:59
    08.06.2009 | 05:55:59 | 05:56:01
    08.06.2009 | 05:56:11 | 05:56:14
    08.06.2009 | 05:56:23 | 05:56:27
    

    展平结果:

    Date       | Start    | Stop     | Duration
    -----------+----------+----------+---------
    02.06.2009 | 05:55:28 | 09:58:27 | 04:02:59
    02.06.2009 | 10:15:19 | 13:58:43 | 03:43:24
    03.06.2009 | 05:54:48 | 10:00:13 | 04:05:25
    03.06.2009 | 10:12:50 | 14:02:31 | 03:49:41
    04.06.2009 | 05:48:27 | 09:59:07 | 04:10:40
    04.06.2009 | 10:15:52 | 14:00:49 | 03:44:58
    05.06.2009 | 05:53:58 | 09:59:12 | 04:05:14
    05.06.2009 | 10:16:05 | 13:59:16 | 03:43:11
    06.06.2009 | 06:04:00 | 10:00:00 | 03:56:00
    06.06.2009 | 10:16:54 | 10:18:45 | 00:01:51
    06.06.2009 | 10:23:00 | 13:57:58 | 03:34:58
    07.06.2009 | 21:59:30 | 01:58:49 | 03:59:19
    08.06.2009 | 02:10:33 | 05:56:27 | 03:45:54
    
    7 回复  |  直到 6 年前
        1
  •  7
  •   Tom H zenazn    15 年前

    这里有一个只支持SQL的解决方案。我在列中使用了日期时间。在我看来,把时间分开储存是一个错误,因为当时间过了午夜,你就会遇到问题。不过,如果需要的话,你可以调整它来处理这种情况。解决方案还假定开始和结束时间不为空。同样,如果情况并非如此,您可以根据需要进行调整。

    解决方案的一般要点是获取所有不与任何其他跨度重叠的开始时间,获取所有不与任何跨度重叠的结束时间,然后将两者匹配在一起。

    结果与您期望的结果相匹配,但有一种情况除外,这种情况下,手工检查似乎在您的预期输出中有错误。在第6天应该有一个跨度在2009-06-06 10:18:45.000结束。

    SELECT
         ST.start_time,
         ET.end_time
    FROM
    (
         SELECT
              T1.start_time
         FROM
              dbo.Test_Time_Spans T1
         LEFT OUTER JOIN dbo.Test_Time_Spans T2 ON
              T2.start_time < T1.start_time AND
              T2.end_time >= T1.start_time
         WHERE
              T2.start_time IS NULL
    ) AS ST
    INNER JOIN
    (
         SELECT
              T3.end_time
         FROM
              dbo.Test_Time_Spans T3
         LEFT OUTER JOIN dbo.Test_Time_Spans T4 ON
              T4.end_time > T3.end_time AND
              T4.start_time <= T3.end_time
         WHERE
              T4.start_time IS NULL
    ) AS ET ON
         ET.end_time > ST.start_time
    LEFT OUTER JOIN
    (
         SELECT
              T5.end_time
         FROM
              dbo.Test_Time_Spans T5
         LEFT OUTER JOIN dbo.Test_Time_Spans T6 ON
              T6.end_time > T5.end_time AND
              T6.start_time <= T5.end_time
         WHERE
              T6.start_time IS NULL
    ) AS ET2 ON
         ET2.end_time > ST.start_time AND
         ET2.end_time < ET.end_time
    WHERE
         ET2.end_time IS NULL
    
        2
  •  4
  •   Quassnoi    15 年前

    MySQL :

    SELECT  grouper, MIN(start) AS group_start, MAX(end) AS group_end
    FROM    (
            SELECT  start,
                    end,
                    @r := @r + (@edate < start) AS grouper,
                    @edate := GREATEST(end, CAST(@edate AS DATETIME))
            FROM    (
                    SELECT  @r := 0,
                            @edate := CAST('0000-01-01' AS DATETIME)
                    ) vars,
                    (
                    SELECT  rn_date + INTERVAL TIME_TO_SEC(rn_start) SECOND AS start,
                            rn_date + INTERVAL TIME_TO_SEC(rn_end) SECOND + INTERVAL (rn_start > rn_end) DAY AS end
                    FROM    t_ranges
                    ) q
            ORDER BY
                    start
            ) q
    GROUP BY
            grouper
    ORDER BY
            group_start
    

    同样的决定 SQL Server 以下文章在我的博客中介绍:

    执行此操作的功能如下:

    DROP FUNCTION fn_spans
    GO
    CREATE FUNCTION fn_spans(@p_from DATETIME, @p_till DATETIME)
    RETURNS @t TABLE
            (
            q_start DATETIME NOT NULL,
            q_end DATETIME NOT NULL
            )
    AS
    BEGIN
            DECLARE @qs DATETIME
            DECLARE @qe DATETIME
            DECLARE @ms DATETIME
            DECLARE @me DATETIME
            DECLARE cr_span CURSOR FAST_FORWARD
            FOR
            SELECT  s_date + s_start AS q_start,
                    s_date + s_stop + CASE WHEN s_start < s_stop THEN 0 ELSE 1 END AS q_end
            FROM    t_span
            WHERE   s_date BETWEEN @p_from - 1 AND @p_till
                    AND s_date + s_start >= @p_from
                    AND s_date + s_stop <= @p_till
            ORDER BY
                    q_start
            OPEN    cr_span
            FETCH   NEXT
            FROM    cr_span
            INTO    @qs, @qe
            SET @ms = @qs
            SET @me = @qe
            WHILE @@FETCH_STATUS = 0
            BEGIN
                    FETCH   NEXT
                    FROM    cr_span
                    INTO    @qs, @qe
                    IF @qs > @me
                    BEGIN
                            INSERT
                            INTO    @t
                            VALUES (@ms, @me)
                            SET @ms = @qs
                    END
                    SET @me = CASE WHEN @qe > @me THEN @qe ELSE @me END
            END
            IF @ms IS NOT NULL 
            BEGIN
                    INSERT
                    INTO    @t
                    VALUES  (@ms, @me)
            END
            CLOSE   cr_span
            RETURN
    END
    

    自从 SQL Server 缺少一种简单的方法来引用结果集中以前选定的行,这是在 SQL Server 比基于设置的决策更快地工作。

    在测试 1,440,000 行,用于 24 全盘秒数,一天或两天内几乎是瞬间。

    注意中的附加条件 SELECT 查询:

    s_date BETWEEN @p_from - 1 AND @p_till
    

    这似乎是多余的,但它实际上是一个粗糙的过滤器,用于创建索引 s_date 可用的。

        3
  •  3
  •   Community CDub    7 年前

    类似问题如下:

    Min effective and termdate for contiguous dates

    fwiw i up投票推荐了Joe Celko的SQL for Smarties,第三版——重复:第三版(2005)——讨论了各种方法、设置基础和过程。

        4
  •  2
  •   MahlerFive    15 年前

    假设你:

    • 有一些简单的自定义日期对象,它存储开始日期/时间和结束日期/时间
    • 将行按排序顺序(按开始日期/时间)返回为列表, L ,日期
    • 要创建日期的扁平列表, f

    执行以下操作:

    first = first row in L
    flat_date.start = first.start, flat_date.end = first.end
    For each row in L:
        if row.start < flat_date.end and row.end > flat_date.end: // adding on to a timespan
            flat_date.end = row.end
        else: // ending a timespan and starting a new one
            add flat_date to F
            flat_date.start = row.start, flat_date.end = row.end
    add flat_date to F // adding the last timespan to the flattened list
    
        5
  •  1
  •   VVS    15 年前

    这里有一个递归的CTE解决方案,但是我可以自由地为每一列指定日期和时间,而不是单独提取日期。有助于避免一些混乱的特殊情况代码。如果必须单独存储日期,我将使用CTE视图使其看起来像两个日期时间列,并使用此方法。

    创建测试数据:

    create table t1 (d1 datetime, d2 datetime)
    
    insert t1 (d1,d2)
        select           '2009-06-03 10:00:00', '2009-06-03 14:00:00'
        union all select '2009-06-03 13:55:00', '2009-06-03 18:00:00'
        union all select '2009-06-03 17:55:00', '2009-06-03 23:00:00'
        union all select '2009-06-03 22:55:00', '2009-06-04 03:00:00'
    
        union all select '2009-06-04 03:05:00', '2009-06-04 07:00:00'
    
        union all select '2009-06-04 07:05:00', '2009-06-04 10:00:00'
        union all select '2009-06-04 09:55:00', '2009-06-04 14:00:00'
    

    递归CTE:

    ;with dateRanges (ancestorD1, parentD1, d2, iter) as
    (
    --anchor is first level of collapse
        select
            d1 as ancestorD1,
            d1 as parentD1,
            d2,
            cast(0 as int) as iter
        from t1
    
    --recurse as long as there is another range to fold in
        union all select
            tLeft.ancestorD1,
            tRight.d1 as parentD1,
            tRight.d2,
            iter + 1  as iter
        from dateRanges as tLeft join t1 as tRight
            --join condition is that the t1 row can be consumed by the recursive row
            on tLeft.d2 between tRight.d1 and tRight.d2
                --exclude identical rows
                and not (tLeft.parentD1 = tRight.d1 and tLeft.d2 = tRight.d2)
    )
    select
        ranges1.*
    from dateRanges as ranges1
    where not exists (
        select 1
        from dateRanges as ranges2
        where ranges1.ancestorD1 between ranges2.ancestorD1 and ranges2.d2
            and ranges1.d2 between ranges2.ancestorD1 and ranges2.d2
            and ranges2.iter > ranges1.iter
    )
    

    给出输出:

    ancestorD1              parentD1                d2                      iter
    ----------------------- ----------------------- ----------------------- -----------
    2009-06-04 03:05:00.000 2009-06-04 03:05:00.000 2009-06-04 07:00:00.000 0
    2009-06-04 07:05:00.000 2009-06-04 09:55:00.000 2009-06-04 14:00:00.000 1
    2009-06-03 10:00:00.000 2009-06-03 22:55:00.000 2009-06-04 03:00:00.000 3
    
        6
  •  0
  •   Bernhard Hofmann    15 年前

    为了帮助回答问题,以下是问题中给定的示例数据,这些示例数据位于类似hainstech的表变量中:

    declare @T1 table (d1 datetime, d2 datetime)
    
    insert @T1 (d1,d2)
    select           '02 June 2009 05:55:28','02 June 2009 09:58:27'
    union all select '02 June 2009 10:15:19','02 June 2009 13:58:24'
    union all select '02 June 2009 13:58:24','02 June 2009 13:58:43'
    union all select '03 June 2009 05:54:48','03 June 2009 10:00:13'
    union all select '03 June 2009 09:26:45','03 June 2009 09:59:40'
    union all select '03 June 2009 10:12:50','03 June 2009 10:27:25'
    union all select '03 June 2009 10:13:12','03 June 2009 11:14:56'
    union all select '03 June 2009 10:27:25','03 June 2009 10:27:31'
    union all select '03 June 2009 10:27:39','03 June 2009 13:53:38'
    union all select '03 June 2009 11:14:56','03 June 2009 11:15:03'
    union all select '03 June 2009 11:15:30','03 June 2009 14:02:14'
    union all select '03 June 2009 13:53:38','03 June 2009 13:53:43'
    union all select '03 June 2009 14:02:14','03 June 2009 14:02:31'
    union all select '04 June 2009 05:48:27','04 June 2009 09:58:59'
    union all select '04 June 2009 06:00:00','04 June 2009 09:59:07'
    union all select '04 June 2009 10:15:52','04 June 2009 13:54:52'
    union all select '04 June 2009 10:16:01','04 June 2009 13:24:20'
    union all select '04 June 2009 13:24:20','04 June 2009 13:24:24'
    union all select '04 June 2009 13:24:32','04 June 2009 14:00:39'
    union all select '04 June 2009 13:54:52','04 June 2009 13:54:58'
    union all select '04 June 2009 14:00:39','04 June 2009 14:00:49'
    union all select '05 June 2009 05:53:58','05 June 2009 09:59:12'
    union all select '05 June 2009 10:16:05','05 June 2009 13:59:08'
    union all select '05 June 2009 13:59:08','05 June 2009 13:59:16'
    union all select '06 June 2009 06:04:00','06 June 2009 10:00:00'
    union all select '06 June 2009 10:16:54','06 June 2009 10:18:40'
    union all select '06 June 2009 10:18:40','06 June 2009 10:18:45'
    union all select '06 June 2009 10:23:00','06 June 2009 13:57:00'
    union all select '06 June 2009 10:23:48','06 June 2009 13:57:54'
    union all select '06 June 2009 13:57:21','06 June 2009 13:57:38'
    union all select '06 June 2009 13:57:54','06 June 2009 13:57:58'
    union all select '07 June 2009 21:59:30','07 June 2009 01:58:49'
    union all select '07 June 2009 22:12:16','07 June 2009 01:58:39'
    union all select '07 June 2009 22:12:25','07 June 2009 01:58:28'
    union all select '08 June 2009 02:10:33','08 June 2009 05:56:11'
    union all select '08 June 2009 02:10:43','08 June 2009 05:56:23'
    union all select '08 June 2009 02:10:49','08 June 2009 05:55:59'
    union all select '08 June 2009 05:55:59','08 June 2009 05:56:01'
    union all select '08 June 2009 05:56:11','08 June 2009 05:56:14'
    union all select '08 June 2009 05:56:23','08 June 2009 05:56:27'
    
        7
  •  0
  •   Chris    9 年前

    在Mahler5的回答上,我写了一个对DateTools的快速扩展。到目前为止,它已经通过了我所有的测试。

    extension DTTimePeriodCollection {
    
        func flatten() {
    
            self.sortByStartAscending()
    
            guard let periods = self.periods() else { return }
            if periods.count < 1 { return }
    
            var flattenedPeriods = [DTTimePeriod]()
            let flatdate = DTTimePeriod()
    
            for period in periods {
    
                guard let periodStart = period.StartDate, let periodEnd = period.EndDate else { continue }
    
                if !flatdate.hasStartDate() { flatdate.StartDate = periodStart }
                if !flatdate.hasEndDate() { flatdate.EndDate = periodEnd }
    
                if periodStart.isEarlierThanOrEqualTo(flatdate.EndDate) && periodEnd.isGreaterThanOrEqualTo(flatdate.EndDate) {
    
                    flatdate.EndDate = periodEnd
    
                } else {
    
                    flattenedPeriods.append(flatdate.copy())
                    flatdate.StartDate = periodStart
                    flatdate.EndDate = periodEnd
                }
            }
    
            flattenedPeriods.append(flatdate.copy())
    
            // delete all periods
            for var i = 0 ; i < periods.count ; i++ { self.removeTimePeriodAtIndex(0) }
    
            // add flattened periods to self
            for flat in flattenedPeriods { self.addTimePeriod(flat) }
        }