代码之家  ›  专栏  ›  技术社区  ›  Jay Askren

如何将简单的布尔语句转换为SQL?

  •  6
  • Jay Askren  · 技术社区  · 15 年前

    我有以下数据库表,其中包含有关人员、疾病和药物的信息:

    PERSON_T              DISEASE_T               DRUG_T
    =========             ==========              ========
    PERSON_ID             DISEASE_ID              DRUG_ID
    GENDER                PERSON_ID               PERSON_ID
    NAME                  DISEASE_START_DATE      DRUG_START_DATE
                          DISEASE_END_DATE        DRUG_END_DATE
    

    有哪些疾病。从中我可以找出哪些模式对我来说是有趣的 让我深入研究。例如,下面是一个简化的布尔模式的例子,我可能会发现疾病52:

    ( (Drug 234 = false AND Drug 474 = true AND Drug 26 = false) OR 
      (Drug 395 = false AND Drug 791 = false AND Drug 371 = true) )
    

    编辑: 下面是另一个例子:

    ( (Drug 234 = true AND Drug 474 = true AND Drug 26 = false) OR 
          (Drug 395 = false AND Drug 791 = false AND Drug 371 = true) )
    

    现在我想将此模式转换为sql查询,并查找与此模式匹配的所有人员。
    例如,我想亲自找到所有患病的人 (在出现症状前未服用234和26号药物,但在出现症状前服用474号药物)或 (在出现症状前服用371号药物,但在出现症状前不服用791号和395号药物)

    这是我的第一次尝试,但我在第一学期遇到了困难:

    SELECT * FROM PERSON_T, DRUG_T, DISEASE_T 
      WHERE DISEASE_ID = 52 AND 
        PERSON_T.PERSON_ID = DISEASE_T.PERSON_ID AND 
        PERSON_T.PERSON_ID = DRUG_T.PERSON_ID  AND 
        (DRUG_T.DRUG_ID=234 AND (DRUG_T.DRUG_START_DATE>DISEASE_T.END_DATE || ???)
    

    我需要这个在PostgreSql中工作,但是我假设任何给定的答案都可以从给定的数据库转换为PostgreSql。

    对评论的回应

    1. 我修复了数据库的格式 桌子。非常感谢。
    2. 疾病开始日期是患者患病的日期,很可能是症状开始出现的日期。疾病结束日期是患者康复的时间,很可能是症状消失的时间。

    编辑 我加了我自己的答案。有人能想出一个更简单的答案吗?

    10 回复  |  直到 15 年前
        1
  •  4
  •   Justin K    15 年前

    对我来说,直截了当(如果难看的话)的解决方案是使用EXISTS和notexists子句:

    SELECT *
    FROM PERSON_T INNER JOIN DISEASE_T
         USING (PERSON_ID)
    WHERE DISEASE_ID = 52
      AND EXISTS (SELECT 1 FROM DRUG_T
                  WHERE DRUG_T.PERSON_ID = PERSON_T.PERSON_ID
                    AND DRUG_ID = 474
                    AND [time condition])
      AND NOT EXISTS (SELECT 1 FROM DRUG_T
                  WHERE DRUG_T.PERSON_ID = PERSON_T.PERSON_ID
                    AND DRUG_ID = 234
                    AND [time condition])
    

    ……等等。在这个例子中,我们要求的是服用药物474但没有服用药物234的人。显然,你可以根据你的需要用and和ORs对从句进行分组。

        2
  •  1
  •   sgriffinusa    15 年前

    我不知道这在大型表中会有什么效果(我想这会非常糟糕,因为日期比较通常非常昂贵),但这里有一个方法应该可以工作。它相对冗长,但很容易针对不同的布尔情况进行修改。

    例1:

    SELECT dis.*
    FROM disease_t dis
    LEFT JOIN drug d1 ON d1.person_id = dis.person_id AND d1.drug_id = 234
    LEFT JOIN drug d2 ON d2.person_id = dis.person_id AND d2.drug_id = 474
    LEFT JOIN drug d3 ON d3.person_id = dis.person_id AND d3.drug_id = 26
    LEFT JOIN drug d4 ON d4.person_id = dis.person_id AND d4.drug_id = 395
    LEFT JOIN drug d5 ON d5.person_id = dis.person_id AND d5.drug_id = 791
    LEFT JOIN drug d6 ON d6.person_id = dis.person_id AND d6.drug_id = 371
    WHERE dis.disease_id = 52
    AND (((d1.person_id IS NULL OR dis.startdate < d1.startdate) AND
          (d2.person_id IS NOT NULL AND d2.startdate < dis.startdate) AND
          (d3.person_id IS NULL OR dis.startdate < d3.startdate)) 
         OR
         ((d4.person_id IS NULL OR dis.startdate < d4.startdate) AND
          (d5.person_id IS NULL OR dis.startdate < d5.startdate) AND
          (d6.person_id IS NOT NULL AND d6.startdate < dis.startdate)))
    

    SELECT dis.*
    FROM disease_t dis
    LEFT JOIN drug d1 ON d1.person_id = dis.person_id AND d1.drug_id = 234
    LEFT JOIN drug d2 ON d2.person_id = dis.person_id AND d2.drug_id = 474
    LEFT JOIN drug d3 ON d3.person_id = dis.person_id AND d3.drug_id = 26
    LEFT JOIN drug d4 ON d4.person_id = dis.person_id AND d4.drug_id = 395
    LEFT JOIN drug d5 ON d5.person_id = dis.person_id AND d5.drug_id = 791
    LEFT JOIN drug d6 ON d6.person_id = dis.person_id AND d6.drug_id = 371
    WHERE dis.disease_id = 52
    AND (((d1.person_id IS NOT NULL AND d1.startdate < dis.startdate) AND
          (d2.person_id IS NOT NULL AND d2.startdate < dis.startdate) AND
          (d3.person_id IS NULL OR dis.startdate < d3.startdate)) 
         or
         ((d4.person_id IS NULL OR dis.startdate < d4.startdate) AND
          (d5.person_id IS NULL OR dis.startdate < d5.startdate) AND
          (d6.person_id IS NOT NULL AND d6.startdate < dis.startdate)))
    
        3
  •  1
  •   pferate    15 年前

    ( (Drug 234 = true AND Drug 474 = true AND Drug 26 = false) OR (Drug 395 = false AND Drug 791 = false AND Drug 371 = true) ) ,正如您发布的。

    /*
    -- AS DEFINED BY JOINS
    -- All "person_id"'s match
    -- Drug 1 is not Drug 2
    -- Drug 1 is not Drug 3
    -- Drug 2 is not Drug 3
    -- All Drugs are optional as far as the SELECT statement is concerned (left join)
       -- Drug IDs will be defined in the WHERE clause
    -- All Diseases for "person_id"
    
    -- AS DEFINED IN WHERE STATEMENT
    -- Disease IS 52
    -- AND ONE OF THE FOLLOWING:
    --   1) Disease started AFTER Drug 1
    --      Disease started AFTER Drug 2
    --      Drug 1 IS 234
    --      Drug 2 IS 474
    --      Drug 3 IS NOT 26 (AND NOT 234 or 474, as defined in JOINs)
    --   2) Disease started AFTER Drug 3
    --      Drug 1 IS NOT 395
    --      Drug 2 IS NOT 791
    --      Drug 3 IS 371
    */
    
    SELECT p.person_id, p.gender FROM person_t as p
    LEFT JOIN drug_t    AS dr1 ON (p.person_id = dr1.person_id)
    LEFT JOIN drug_t    AS dr2 ON (p.person_id = dr2.person_id AND dr1.drug_id != dr2.drug_id)
    LEFT JOIN drug_t    AS dr3 ON (p.person_id = dr3.person_id AND dr1.drug_id != dr3.drug_id AND dr2.drug_id != dr3.drug_id)
    JOIN      disease_t AS ds  ON (p.person_id = ds.person_id)
    WHERE ds.disease_id = 52
    AND (   (    (dr1.drug_start_date < ds.disease_start_date AND dr2.drug_start_date < ds.disease_start_date)
            AND (dr1.drug_id = 234 AND dr2.drug_id = 474 AND dr3.drug_id != 26)
            )
        OR
            (    (dr3.drug_start_date < ds.disease_start_date)
            AND (dr1.drug_id != 395 AND dr2.drug_id != 791 AND dr3.drug_id = 371)
            )
        )
    
        4
  •  0
  •   ajdams    15 年前

    请原谅任何错误,但我认为这样做是可行的(在T-SQL中):

    SELECT col1, col2, col3...
    FROM PERSON_T AS P, DRUG_T AS DR, DISEASE_T AS DI
    WHERE disease_id = 52
    AND P.person_id = DI.person_id
    AND P.person_id = DR.person_id
    AND drug_id NOT IN(234, 26)
    AND drug_id = 474
    AND disease_start_date < drug_start_date
    UNION
    SELECT col1, col2, col3...
    FROM PERSON_T AS P, DRUG_T AS DR, DISEASE_T AS DI
    WHERE disease_id = 52
    AND P.person_id = DI.person_id
    AND P.person_id = DR.person_id
    AND drug_id NOT IN(791, 395)
    AND drug_id = 371
    AND disease_start_date < drug_start_date

    现在它不需要一个联盟来完成,但为了可读性,我认为这是最简单的考虑到你的条件。也许这会把你引向正确的方向。

        5
  •  0
  •   Peter Hanneman    15 年前
    SELECT per.person_id, per.name, per.gender
    FROM person_t per
    INNER JOIN disease_t dis
    USING (person_id)
    INNER JOIN drug_t drug
    USING (person_id)
    WHERE dis.disease_id = 52 AND drug.drug_start_date < dis.disease_start_date AND ((drug.drug_id IN (234, 474) AND drug.drug_id NOT IN (26)) OR (drug.drug_id IN (371) AND drug.drug_id NOT IN (395, 791)));
    

    这会满足你的要求。最后的IN语句是非常不言自明的。

        6
  •  0
  •   joelt    15 年前

    我没有现成的测试数据来进行测试,但我认为您可以做如下操作:

    SELECT *
    FROM DISEASE_T D
    INNER JOIN DRUG_T DR ON D.PERSON_ID = DR.PERSON_ID AND D.DRUG_ID=52
    INNER JOIN PERSON_T P ON P.PERSON_ID = D.PERSON_ID
    GROUP BY PERSON_ID
    HAVING SUM(
        CASE WHEN DRUG_ID=234 AND DRUG_START_DATE<DISEASE_START_DATE THEN -1 
        WHEN DRUG_ID=474 AND DRUG_START_DATE<DISEASE_START_DATE THEN 1 
        WHEN DRUG_ID=26 AND DRUG_START_DATE<DISEASE_START_DATE THEN -1 
        ELSE 0 END) = 1
        OR
        SUM(
        CASE WHEN DRUG_ID=395 AND DRUG_START_DATE<DISEASE_START_DATE THEN -1 
        WHEN DRUG_ID=791 AND DRUG_START_DATE<DISEASE_START_DATE THEN -1 
        WHEN DRUG_ID=371 AND DRUG_START_DATE<DISEASE_START_DATE THEN 1 
        ELSE 0 END) = 1
    

    (SUM(CASE WHEN DRUG_ID=234 AND DRUG_START_DATE<DISEASE_START_DATE THEN 1 ELSE 0 END) = 0
    AND SUM(CASE WHEN DRUG_ID=474 AND DRUG_START_DATE<DISEASE_START_DATE THEN 1 ELSE 0 END) > 0
    AND SUM(CASE WHEN DRUG_ID=26 AND DRUG_START_DATE<DISEASE_START_DATE THEN 1 ELSE 0 END) = 0)
    OR
    (SUM(CASE WHEN DRUG_ID=395 AND DRUG_START_DATE<DISEASE_START_DATE THEN 1 ELSE 0 END) = 0
    AND SUM(CASE WHEN DRUG_ID=791 AND DRUG_START_DATE<DISEASE_START_DATE THEN 1 ELSE 0 END) = 0
    AND SUM(CASE WHEN DRUG_ID=371 AND DRUG_START_DATE<DISEASE_START_DATE THEN 1 ELSE 0 END) > 0)
    
        7
  •  0
  •   Jeffrey L Whitledge    15 年前

    我可能会从类似的方向来处理这个问题。它很灵活。

    DRUG_DISEASE_CORRELATION_QUERY
    ===============================
    DRUG_DISEASE_CORRELATION_QUERY_ID
    DISEASE_ID
    DESCRIPTION
    
    (1, 52, 'What this query does.')
    (2, 52, 'Add some more results.')
    
    DRUG_DISEASE_CORRELATION_QUERY_INCLUDE_DRUG
    ===========================================
    DRUG_DISEASE_CORRELATION_QUERY_ID
    DRUG_ID
    
    (1, 234)
    (1, 474)
    (2, 371)
    
    DRUG_DISEASE_CORRELATION_QUERY_EXCLUDE_DRUG
    ===========================================
    DRUG_DISEASE_CORRELATION_QUERY_ID
    DRUG_ID
    
    (1, 26)
    (2, 395)
    (2, 791)
    
    
    
    CREATE VIEW DRUG_DISEASE_CORRELATION
    AS
    SELECT 
        p.*,
        q.DRUG_DISEASE_CORRELATION_QUERY_ID
    FROM 
        DRUG_DISEASE_CORRELATION_QUERY q
        INNER JOIN DISEASE_T ds on ds.DISEASE_ID = q.DISEASE_ID
        INNER JOIN PERSON_T p ON p.PERSON_ID = ds.PERSON_ID
      WHERE 
        AND EXISTS (SELECT * FROM DRUG_T dr WHERE dr.PERSON_ID = p.PERSON_ID AND dr.DRUG_ID IN
            (SELECT qid.DRUG_ID FROM DRUG_DISEASE_CORRELATION_QUERY_INCLUDE_DRUG qid WHERE 
            qid.DRUG_DISEASE_CORRELATION_QUERY_ID = q.DRUG_DISEASE_CORRELATION_QUERY_ID)
            AND DRUG_START_DATE < ds.DISEASE_START_DATE)
       AND NOT EXISTS (SELECT * FROM DRUG_T dr WHERE dr.PERSON_ID = p.PERSON_ID AND dr.DRUG_ID IN
            (SELECT qed.DRUG_ID FROM DRUG_DISEASE_CORRELATION_QUERY_EXCLUDE_DRUG qed WHERE 
            qed.DRUG_DISEASE_CORRELATION_QUERY_ID = q.DRUG_DISEASE_CORRELATION_QUERY_ID)
            AND DRUG_START_DATE < ds.DISEASE_START_DATE)
    GO
    
    
    SELECT * FROM DRUG_DISEASE_CORRELATION WHERE DRUG_DISEASE_CORRELATION_QUERY_ID = 1
    UNION
    SELECT * FROM DRUG_DISEASE_CORRELATION WHERE DRUG_DISEASE_CORRELATION_QUERY_ID = 2
    
        8
  •  0
  •   Philip Kelley    15 年前

    • 选出那些人
    • 感染过一(1)种特定疾病的人
    • 接受过一种或多种特定药物治疗的人

    这可以通过将“药物需求”转换成某种形式的临时表来简化。这将允许使用任何数量的“好”和“坏”药物进行查询。我下面的内容可以实现为存储过程,但是如果不是这样的话,有很多复杂的选项可用。

    分解步骤:

    弗斯特

    SELECT [PersonData]
     from DISEASE_T di
      inner join PERSON_T pe
       on pe.Person_Id = di.Person_Id
     where di.Disease_Id = [TargetDisease]
      and [TimeConstraints]
    

    第二 ,对于已经和在一起的每一组“目标”药物,设置一个这样的临时表(这是SQL Server语法,Postgres应该有类似的内容):

    CREATE TABLE #DrugSet
     (
       Drug_Id  [KeyDataType]
      ,Include  int   not null
     )
    

    • Drug\u Id=您正在检查的药物

    计算两个值:

    @好的药物,你希望病人服用的药物数量
    @不良药物,你希望病人没有服用的药物数量

    SELECT pe.[PersonData]  --  All the desired columns from PERSON_T and elsewhere
     from DRUG_T dr
      --  Filter to only include "persons of interest"
      inner join (select [PersonData]
                   from DISEASE_T di
                    inner join PERSON_T pe
                     on pe.Person_Id = di.Person_Id
                   where di.Disease_Id = [TargetDisease]
                    and [TimeConstraints]) pe
       on pe.Person_Id = dr.Person_ID
     --  Join with any of the drugs we are intersted in
     left outer join #DrugSet ta  
      on ta.Drug_Id = dr.Drug_Id
     group by pe.[PersonData]  --  Same as in the SELECT clause
     having sum(case ta.Include
                  when 1 then 1  --  This patient has been given a drug that we're looking to match
                  else 0         --  This patient has not been given this drug (catches NULLs, too)
                end) = @GoodDrugs
      and  sum(case ta.Include
                  when 0 then 1  --  This patient has been given this drug that we're NOT looking to match
                  else 0         --  This patient has not been given this drug (catches NULLs, too)
                end) = @BadDrugs
    

    我故意忽略了时间标准,因为您没有详细介绍它们,但它们应该很容易添加(尽管我希望这不是著名的遗言)。 进一步的优化可能是可能的,但很大程度上取决于数据和其他可能的标准。

    每次传球。你也许可以将“药物集”扩展到你正在检查的每个药物集中,但我不愿意尝试 没有一些重要的数据来测试它的代码。

    */

        9
  •  0
  •   Jay Askren    15 年前

    给出的答案似乎都不管用。我想实现的模式是: ((药物234=正确,药物474=正确,药物26=错误)或 (药品395=假,药品791=假,药品371=真)

    SELECT  p.person_id, p.gender FROM person_t as p 
        join drug_t as dr on dr.person_id = p.person_id 
        join disease_t as ds on ds.person_id=p.person_id 
        WHERE dr.drug_start_date < ds.disease_start_date AND disease_id = 52 AND dr.drug_id=234
    INTERSECT
    SELECT  p.person_id, p.gender FROM person_t as p 
        join drug_t as dr on dr.person_id = p.person_id 
        join disease_t as ds on ds.person_id=p.person_id 
        WHERE dr.drug_start_date < ds.disease_start_date AND disease_id = 52 AND dr.drug_id=474
    INTERSECT (
    SELECT p.person_id, p.gender
        FROM person_t as p 
        JOIN disease_t as ds on ds.person_id = p.person_id 
        LEFT JOIN drug_t as dr ON dr.person_id = p.person_id  AND dr.drug_id = 26
        WHERE disease_id = 52 AND dr.person_id is null 
    UNION 
    SELECT p.person_id, p.gender
        FROM person_t as p 
        JOIN disease_t as ds on ds.person_id = p.person_id 
        JOIN drug_t as dr ON dr.person_id = p.person_id  AND dr.drug_id = 26
        WHERE disease_id = 52 AND dr.drug_start_date > ds.disease_start_date)
    

    这个查询是有效的,但是非常难看。我还怀疑,一旦我有了一个拥有1亿人的生产数据库,它的速度会非常慢。我能做些什么来简化/优化这个查询吗?

        10
  •  0
  •   Community CDub    8 年前

    首先,三个表(人、药、病)如图1.0所示:

    一个人可以拥有 倍数 毒品和多种毒品 疾病 . 每种药物和疾病都有一个开始日期&结束日期。

    因此,我首先将三个表反规范化为一个表(table\u dn),因此:

    dnId | PersonId | DrugId | DiseaseId | DgSt | DgEn | DiSt | DiEn
    ----   --------   ------   ---------   ----   ----   ----   ----
    

    如果需要,这个非标准化的表可以是临时表,而不管表现在包含了图2.0所示的所有全局数据集(表示为G)。

    从我对你的描述的理解来看,我基本上可以看到一个两层过滤器。

    过滤器1

    这个过滤器只是一组布尔值 组合 ,如您在问题描述中所述。如:

    (drug a = 1 & drug b = 0 & etc) OR (.....
    

    过滤器2

    这个过滤器比第一个稍微复杂一点,它是日期范围标准。图3.0显示了该日期范围 . 黄色表示以多种方式跨越的记录日期:

    • 红色期前
    • 红色期后
    • 红色周期之间
    • 红色周期开始后开始

    现在黄枣期可以是药物期或疾病期,也可以是两者的结合期。

    当然,根据你的确切问题,这两个过滤器可能需要反过来(例如,先f2,然后f1)。

    Select sub.*
    From    
          (select    * 
           from      Table_dn 
           where     [Filter 1]
          ) as sub
    
    where [Filter 2]
    

    alt text