代码
下面的代码根据大量数据计算线性回归的斜率和截距。然后应用方程
y = mx + b
根据相同的结果集计算每行的回归线值。
如何连接这两个查询,以便在不执行
WHERE
条款两次?
问题的一般形式是:
SELECT a.group, func(a.group, avg_avg)
FROM a
(SELECT AVG(field1_avg) as avg_avg
FROM (SELECT a.group, AVG(field1) as field1_avg
FROM a
WHERE (SOME_CONDITION)
GROUP BY a.group) as several_lines -- potentially
) as one_line -- always
WHERE (SOME_CONDITION)
GROUP BY a.group -- again, potentially several lines
我有
SOME_CONDITION
执行两次。如下所示(更新为
STRAIGHT_JOIN
优化):
SELECT STRAIGHT_JOIN
AVG(D.AMOUNT) as AMOUNT,
Y.YEAR * ymxb.SLOPE + ymxb.INTERCEPT as REGRESSION_LINE,
Y.YEAR as YEAR,
MAKEDATE(Y.YEAR,1) as AMOUNT_DATE,
ymxb.SLOPE,
ymxb.INTERCEPT,
ymxb.CORRELATION,
ymxb.MEASUREMENTS
FROM
CITY C,
STATION S,
STATION_DISTRICT SD,
YEAR_REF Y,
MONTH_REF M,
DAILY D,
(SELECT
SUM(MEASUREMENTS) as MEASUREMENTS,
((sum(t.YEAR) * sum(t.AMOUNT)) - (count(1) * sum(t.YEAR * t.AMOUNT))) /
(power(sum(t.YEAR), 2) - count(1) * sum(power(t.YEAR, 2))) as SLOPE,
((sum( t.YEAR ) * sum( t.YEAR * t.AMOUNT )) -
(sum( t.AMOUNT ) * sum(power(t.YEAR, 2)))) /
(power(sum(t.YEAR), 2) - count(1) * sum(power(t.YEAR, 2))) as INTERCEPT,
((avg(t.AMOUNT * t.YEAR)) - avg(t.AMOUNT) * avg(t.YEAR)) /
(stddev( t.AMOUNT ) * stddev( t.YEAR )) as CORRELATION
FROM (
SELECT STRAIGHT_JOIN
COUNT(1) as MEASUREMENTS,
AVG(D.AMOUNT) as AMOUNT,
Y.YEAR as YEAR
FROM
CITY C,
STATION S,
STATION_DISTRICT SD,
YEAR_REF Y,
MONTH_REF M,
DAILY D
WHERE
-- For a specific city ...
--
$X{ IN, C.ID, CityCode } AND
-- Find all the stations within a specific unit radius ...
--
6371.009 *
SQRT(
POW(RADIANS(C.LATITUDE_DECIMAL - S.LATITUDE_DECIMAL), 2) +
(COS(RADIANS(C.LATITUDE_DECIMAL + S.LATITUDE_DECIMAL) / 2) *
POW(RADIANS(C.LONGITUDE_DECIMAL - S.LONGITUDE_DECIMAL), 2)) ) <= $P{Radius} AND
SD.ID = S.STATION_DISTRICT_ID AND
-- Gather all known years for that station ...
--
Y.STATION_DISTRICT_ID = SD.ID AND
-- The data before 1900 is shaky; insufficient after 2009.
--
Y.YEAR BETWEEN 1900 AND 2009 AND
-- Filtered by all known months ...
--
M.YEAR_REF_ID = Y.ID AND
-- Whittled down by category ...
--
M.CATEGORY_ID = $P{CategoryCode} AND
-- Into the valid daily climate data.
--
M.ID = D.MONTH_REF_ID AND
D.DAILY_FLAG_ID <> 'M'
GROUP BY
Y.YEAR
) t
) ymxb
WHERE
-- For a specific city ...
--
$X{ IN, C.ID, CityCode } AND
-- Find all the stations within a specific unit radius ...
--
6371.009 *
SQRT(
POW(RADIANS(C.LATITUDE_DECIMAL - S.LATITUDE_DECIMAL), 2) +
(COS(RADIANS(C.LATITUDE_DECIMAL + S.LATITUDE_DECIMAL) / 2) *
POW(RADIANS(C.LONGITUDE_DECIMAL - S.LONGITUDE_DECIMAL), 2)) ) <= $P{Radius} AND
SD.ID = S.STATION_DISTRICT_ID AND
-- Gather all known years for that station ...
--
Y.STATION_DISTRICT_ID = SD.ID AND
-- The data before 1900 is shaky; insufficient after 2009.
--
Y.YEAR BETWEEN 1900 AND 2009 AND
-- Filtered by all known months ...
--
M.YEAR_REF_ID = Y.ID AND
-- Whittled down by category ...
--
M.CATEGORY_ID = $P{CategoryCode} AND
-- Into the valid daily climate data.
--
M.ID = D.MONTH_REF_ID AND
D.DAILY_FLAG_ID <> 'M'
GROUP BY
Y.YEAR
问题
如何在每个查询中只执行一次重复位,而不是两次?重复代码:
$X{ IN, C.ID, CityCode } AND
6371.009 *
SQRT(
POW(RADIANS(C.LATITUDE_DECIMAL - S.LATITUDE_DECIMAL), 2) +
(COS(RADIANS(C.LATITUDE_DECIMAL + S.LATITUDE_DECIMAL) / 2) *
POW(RADIANS(C.LONGITUDE_DECIMAL - S.LONGITUDE_DECIMAL), 2)) ) <= $P{Radius} AND
SD.ID = S.STATION_DISTRICT_ID AND
Y.STATION_DISTRICT_ID = SD.ID AND
Y.YEAR BETWEEN 1900 AND 2009 AND
M.YEAR_REF_ID = Y.ID AND
M.CATEGORY_ID = $P{CategoryCode} AND
M.ID = D.MONTH_REF_ID AND
D.DAILY_FLAG_ID <> 'M'
GROUP BY
Y.YEAR
更新1
使用变量和拆分查询似乎允许缓存启动,因为它现在在3.5秒内运行,而它以前在7秒内运行。不过,如果有任何方法可以删除重复的代码,我会感谢任何帮助。
更新2
上面的代码不在jaspereports中运行,一个视图,虽然是一个可能的修复,但可能会非常低效(因为where子句是参数化的)。
更新3
使用Unreason提出的毕达哥拉斯公式和会合子午线验证距离:
6371.009 *
SQRT(
POW(RADIANS(C.LATITUDE_DECIMAL - S.LATITUDE_DECIMAL), 2) +
(COS(RADIANS(C.LATITUDE_DECIMAL + S.LATITUDE_DECIMAL) / 2) *
POW(RADIANS(C.LONGITUDE_DECIMAL - S.LONGITUDE_DECIMAL), 2)) )
(这与问题无关,但是否有人想知道…)
更新4
如图所示,代码在jaspereports中工作,运行在mysql数据库上。jaspereports不允许变量或多个查询。
更新5
我在寻找一个执行干净的解决方案。;-)我已经编写了一些部分工作的解决方案,但不幸的是,mysql不理解
部分正确
. 请参阅与Unreason的讨论,以获取几乎有效的答案。
更新6
我也许可以从一开始就重用变量
哪里
子句并将它们与第二个子句进行比较(从而消除
一些
复制--检查
$P{}
但我真的希望消除重复。
更新7
比较
YEAR
子句,如前一次更新中所假设的,以消除重复项
BETWEEN
,不起作用。
相关的
How to eliminate duplicate calculation in SQL?
谢谢您!