代码之家 › 专栏 › 技术社区 › Gregg Lind

一个简单查询PostgreSQL的算法改进

complexity-theory postgresql algorithm sql

Gregg Lind · 技术社区 · 15 年前

高水平 我能做这个吗 order by , group by 基于 sum 快些吗?(第8.4页,FWIW,在一张不小的桌子上…。想想O(百万行)

假设我有一张这样的桌子:

                                 Table "public.summary"
   Column    |       Type        |                      Modifiers
-------------+-------------------+------------------------------------------------------
 ts          | integer           | not null default nextval('summary_ts_seq'::regclass)
 field1      | character varying | not null
 otherfield  | character varying | not null
 country     | character varying | not null
 lookups     | integer           | not null


Indexes:
    "summary_pk" PRIMARY KEY, btree (ts, field1, otherfield, country)
    "ix_summary_country" btree (country)
    "ix_summary_field1" btree (field1)
    "ix_summary_otherfield" btree (otherfield)
    "ix_summary_ts" btree (ts)

我想要的是:

select summary.field1,
    summary.country,
    summary.ts,
    sum(summary.lookups) as lookups,
from summary
where summary.country = 'za' and
    summary.ts = 1275177600
group by summary.field1, summary.country, summary.ts
order by summary.ts, lookups desc, summary.field1
limit 100;

(英语:某个特定国家(TS,国家)的前100个字段1,其中“Topness”是总和查找任何匹配行的次数,而不考虑其他字段的值)

有 任何东西 我真的能加快速度吗?算法上的这似乎是一种全桌扫描的东西,但我可能遗漏了一些东西。

3 回复 | 直到 15 年前

Nick Johnson 15 年前

此查询的任何查询计划都必须扫描与where条件匹配的每一行,并按分组条件进行汇总,即工作量与group by的输入行数成比例,而不是结果行数。

对于这样的查询,最有效的查询计划可能是单索引扫描。如果按此顺序在(country,ts)上构建索引,则应该可以这样做;使用该索引,此表单的每个可能查询都解析为索引上的连续范围。不过,这仍然需要一个内存中的排序——使用不同的索引可能可以避免这种情况。

不过,正如其他人所说,发布执行计划是您的最佳选择。

a_horse_with_no_name 15 年前

为了能够提出任何建议,您应该发布查询的执行计划。

而“omg-ponies”是对的:限制100会将总结果限制为100行,它不会对单个组起作用!

Postgres wiki中有一篇很好的文章解释了如何发布与慢速查询相关的问题:

http://wiki.postgresql.org/wiki/SlowQueryQuestions

rfusca 15 年前

(国家,TS)的指数是最好的选择(就像尼克·约翰逊建议的那样),另外,你可能想提高 work_mem 如果不是很高。如果需要,可以在运行时设置(如果设置得很高,则建议设置)。它将有助于将您的分类保存在内存中,而不会溢出到磁盘(如果发生这种情况的话)。

为了获得真正的帮助,我们需要 EXPLAIN ANALYZE ,在explain.depesz.com上发布可以使其非常可读。