代码之家  ›  专栏  ›  技术社区  ›  FrustratedWithFormsDesigner

Oracle—记录对之间的分组

  •  0
  • FrustratedWithFormsDesigner  · 技术社区  · 16 年前

    我有一个日志表,其中包含如下数据:

    ID     MSG                DATE
    ---------------------------------------------
    1      TEst               2010-01-01 09:00:00
    2      Job Start          2010-01-01 09:03:00
    3      Do something       2010-01-01 09:03:10
    4      Do something else  2010-01-01 09:03:12
    5      Do something       2010-01-01 09:04:19
    6      Job End            2010-01-01 09:06:30
    7      Job Start          2010-01-01 09:18:03
    8      Do something       2010-01-01 09:18:17
    9      Do other thing     2010-01-01 09:19:48
    10     Job End            2010-01-01 09:20:27
    

    它包含(除其他外)由应用程序编写的消息。我感兴趣的是为所有这两对“Job Start”和“Job End”记录生成一个关于在“Job Start”和“Job End”记录之间写入的所有内容的报告。理想情况下,报告应如下所示:

    BATCH_NUM   ID     MSG                DATE
    ---------------------------------------------------------         
    1           3      Do something       2010-01-01 09:03:10
    1           4      Do something else  2010-01-01 09:03:12
    1           5      Do something       2010-01-01 09:04:19
    
    2           8      Do something       2010-01-01 09:18:17
    2           9      Do other thing     2010-01-01 09:19:48
    

    输出报告忽略了“Job STart”和“Job End”消息,以及“TEst”消息(存在于一对“Job STart”和“Job End”之外)。

    (oracle版本为10g)

    3 回复  |  直到 16 年前
        1
  •  1
  •   Adam Musch    16 年前

    我确信有一种更具分析性的方法可以做到这一点,但是我用标量子查询来构建窗口。

    SQL> select * from logging_table;
    
            ID MSG                            LOG_DT
    ---------- ------------------------------ -------------------
             1 TEst                           2010-01-01 09:00:00
             2 Job Start                      2010-01-01 09:03:00
             3 Do something                   2010-01-01 09:03:10
             4 Do something else              2010-01-01 09:03:12
             5 Do something                   2010-01-01 09:04:19
             6 Job End                        2010-01-01 09:06:30
             7 Job Start                      2010-01-01 09:18:03
             8 Do something                   2010-01-01 09:18:17
             9 Do other thing                 2010-01-01 09:19:48
            10 Job End                        2010-01-01 09:20:27
    
    SQL> l
      1      select dense_rank() over (order by job_start_id) as batch, 
             -- ^-- this part gets the batch
      2             job_step_id, msg, log_dt
      3             -- nested select to filter out rows outside of the boundaries
      4        from (select *
      5                from (select id as job_step_id, msg, log_dt,
      6                             -- scalar subquery to get start of "window"
      7                             (select max(id)
      8                                from logging_table
      9                               where msg = 'Job Start'
     10                                 and id < log.id) as job_start_id,
     11                             -- scalar subquery to get end of "window"
     12                             (select min(id)
     13                                from logging_table
     14                               where msg = 'Job End'
     15                                 and id > log.id) as job_end_id
     16                       from logging_table log
     17                      -- filter out the "window" rows themselves
     18                      where msg not in ('Job Start', 'Job End')
     19                     )
     20               -- the filtering out of "unbounded" records
     21               where job_start_id is not null
     22                 and job_end_id is not null
     23             )
     24*      order by job_step_id
    SQL> /
    
     BATCH JOB_STEP_ID MSG                            LOG_DT
    

         1           3 Do something                   2010-01-01 09:03:10
         1           4 Do something else              2010-01-01 09:03:12
         1           5 Do something                   2010-01-01 09:04:19
         2           8 Do something                   2010-01-01 09:18:17
         2           9 Do other thing                 2010-01-01 09:19:48
    
        2
  •  0
  •   Allan    16 年前

    我不认为可以直接使用DENSE\u RANK(),因为没有可用于分区的列。

    而且,这两种解决方案中的任何一种都假设日志集永远不会重叠。如果第二盘在第一盘结束之前开始,那是一个全新的问题。。。

    WITH logging_sets AS
         (SELECT DENSE_RANK () OVER (ORDER BY start_date) AS set_rank, start_date, end_date
            FROM (SELECT CASE msg
                            WHEN 'Job End'
                               THEN NULL
                            ELSE LEAD (log_dt, 1, NULL) OVER (ORDER BY log_dt)
                         END AS end_date, log_dt AS start_date, msg
                    FROM logging_table lt
                   WHERE msg IN ('Job Start', 'Job End') )
           WHERE msg = 'Job Start')
    SELECT ls.set_rank, lt.ID, lt.msg, lt.log_dt
      FROM logging_table lt, logging_sets ls
     WHERE lt.log_dt > ls.start_date AND lt.log_dt < ls.end_date 
       AND msg NOT IN ('Job Start', 'Job End')
    ORDER BY ls.set_rank, lt.log_dt;
    
        3
  •  0
  •   Dmitry Grekov    10 年前

    为了以防万一,这里有一种不使用窗口函数的方法。

    with logging_table as (
        select 1 id, 'job start' msg from dual union
        select 2, 'do somenthing in batch 1' from dual union
        select 3, 'do somenthing else in batch 1' from dual union
        select 4, 'job end' from dual union
        select 5, 'job start' from dual union
        select 6, 'do somenthing in batch 2' from dual union
        select 7, 'do somenthing else in batch 2' from dual union    
        select 8, 'job end' from dual 
    ),
    jobs as (
        select  lt_start.id id_start,
               (select min(id) 
                from logging_table lt_end 
                where lt_end.id > lt_start.id 
                and msg = 'job end') id_end,
                rownum as batch_no
        from    logging_table lt_start
        where   msg = 'job start'
    )
    select  *
    from    logging_table join jobs 
            on id > id_start and id < id_end
    order   by batch_no, id