我必须找到一组带有共享密钥的结果,这些结果在过去90天内发布到本地数据存储的API,但在过去7天内发布。
这将是一个有效的组,我需要为其捕获1234:
account_id,company_id,posted_date
1234,A,2018-02-28
1234,B,2018-03-13
1234,C.2018-04-23
1234,D,2018-05-15
这将是一个无效的组。如果单个日期不在查询的上或下段,则应将帐户ID从最终结果中排除:
account_id,company_id,posted_date
5678,Z,2018-02-01
5678,Y,2018-03-13
5678,X.2018-04-23
5678,W,2018-05-21
这是使用子查询的查询的第一个草稿:
SELECT DISTINCT account_id, company_id FROM local_data_store.result_api
WHERE account_id NOT IN (
SELECT account_id FROM local_data_store.result_api
GROUP BY account_id
HAVING posted_date > DATE_SUB(NOW(), INTERVAL 7 DAY)
)
AND account_did IN (
SELECT account_did FROM local_data_store.result_api
GROUP BY account_did
HAVING posted_date > DATE_SUB(NOW(), INTERVAL 90 DAY)
)
GROUP BY account_id, company_id
LIMIT 100000;
这是我现在正在处理的没有子查询的查询(我尝试了联接,但它们确实不起作用):
SELECT DISTINCT account_id, company_id,
COUNT(ra1.posted_date > DATE_SUB(NOW(), INTERVAL 90 DAY)) AS day90,
COUNT(ra1.posted_date > DATE_SUB(NOW(), INTERVAL 7 DAY)) as day7
FROM local_data_store.result_api ra1
GROUP BY posted_date, account_id;
但它的运行时间太长,数据库连接超时。这只在一个包含375000行的数据库表上。