代码之家  ›  专栏  ›  技术社区  ›  Ghislain Leveque

有没有更好的方法来计算中位数(不是平均数)

  •  14
  • Ghislain Leveque  · 技术社区  · 14 年前

    CREATE TABLE x (i serial primary key, value integer not null);
    

    我想计算 value

    下面是我如何计算中位数的方法,但我想一定有更好的方法:

    SELECT AVG(values_around_median) AS median
      FROM (
        SELECT
           DISTINCT(CASE WHEN FIRST_VALUE(above) OVER w2 THEN MIN(value) OVER w3 ELSE MAX(value) OVER w2 END)
            AS values_around_median
          FROM (
            SELECT LAST_VALUE(value) OVER w AS value,
                   SUM(COUNT(*)) OVER w > (SELECT count(*)/2 FROM x) AS above
              FROM x
              GROUP BY value
              WINDOW w AS (ORDER BY value)
              ORDER BY value
            ) AS find_if_values_are_above_or_below_median
          WINDOW w2 AS (PARTITION BY above ORDER BY value DESC),
                 w3 AS (PARTITION BY above ORDER BY value ASC)
        ) AS find_values_around_median
    

    7 回复  |  直到 13 年前
        1
  •  16
  •   Scott Bailey    14 年前

    确实有一个更简单的方法。在Postgres中,您可以定义自己的聚合函数。不久前,我在PostgreSQL代码段库中发布了一些函数来实现中间值、模式和范围。

    http://wiki.postgresql.org/wiki/Aggregate_Median

        2
  •  25
  •   Lukas Eder    10 年前

    是的,在postgresql9.4中,您可以使用新引入的逆分布函数 PERCENTILE_CONT() ,一个在SQL标准中指定的有序集聚合函数。

    WITH t(value) AS (
      SELECT 1   UNION ALL
      SELECT 2   UNION ALL
      SELECT 100 
    )
    SELECT
      percentile_cont(0.5) WITHIN GROUP (ORDER BY value)
    FROM
      t;
    

    This emulation of MEDIAN() via PERCENTILE_CONT() is also documented here

        3
  •  7
  •   Erwin Brandstetter    10 年前

    更简单的查询:

    WITH y AS (
       SELECT value, row_number() OVER (ORDER BY value) AS rn
       FROM   x
       WHERE  value IS NOT NULL
       )
    , c AS (SELECT count(*) AS ct FROM y) 
    SELECT CASE WHEN c.ct%2 = 0 THEN
              round((SELECT avg(value) FROM y WHERE y.rn IN (c.ct/2, c.ct/2+1)), 3)
           ELSE
                    (SELECT     value  FROM y WHERE y.rn = (c.ct+1)/2)
           END AS median
    FROM   c;
    

    要点

    测试表明,新版本比问题中的查询快4倍(并产生正确的结果):

    CREATE TEMP TABLE x (value int);
    INSERT INTO x SELECT generate_series(1,10000);
    INSERT INTO x VALUES (NULL),(NULL),(NULL),(3);
    
        4
  •  0
  •   Chris B    8 年前

    对于谷歌用户来说:还有 http://pgxn.org/dist/quantile 中线可在安装此延长线后在一条线上计算。

        5
  •  0
  •   Ghost    8 年前

    仅具有本机postgres函数的简单sql:

    select 
        case count(*)%2
            when 1 then (array_agg(num order by num))[count(*)/2+1]
            else ((array_agg(num order by num))[count(*)/2]::double precision + (array_agg(num order by num))[count(*)/2+1])/2
        end as median
    from unnest(array[5,17,83,27,28]) num;
    

        6
  •  0
  •   toha    8 年前
    CREATE TABLE array_table (id integer, values integer[]) ;
    
    INSERT INTO array_table VALUES ( 1,'{1,2,3}');
    INSERT INTO array_table VALUES ( 2,'{4,5,6,7}');
    
    select id, values, cardinality(values) as array_length,
    (case when cardinality(values)%2=0 and cardinality(values)>1 then (values[(cardinality(values)/2)]+ values[((cardinality(values)/2)+1)])/2::float 
     else values[(cardinality(values)+1)/2]::float end) as median  
     from array_table
    

    或者您可以创建一个函数,并在以后的查询中随时使用它。

    CREATE OR REPLACE FUNCTION median (a integer[]) 
    RETURNS float AS    $median$ 
    Declare     
        abc float; 
    BEGIN    
        SELECT (case when cardinality(a)%2=0 and cardinality(a)>1 then 
               (a[(cardinality(a)/2)] + a[((cardinality(a)/2)+1)])/2::float   
               else a[(cardinality(a)+1)/2]::float end) into abc;    
        RETURN abc; 
    END;    
    $median$ 
    LANGUAGE plpgsql;
    
    select id,values,median(values) from array_table
    
        7
  •  0
  •   Sowmiya Raja Radhakrishnan    8 年前

    使用下面的函数查找第n个百分位数

    CREATE or REPLACE FUNCTION nth_percentil(anyarray, int)
        RETURNS 
            anyelement as 
        $$
            SELECT $1[$2/100.0 * array_upper($1,1) + 1] ;
        $$ 
    LANGUAGE SQL IMMUTABLE STRICT;
    

    你的情况是第50百分位。

    SELECT nth_percentil(ARRAY (SELECT Field_name FROM table_name ORDER BY 1),50)
    

    这将给你第50百分位,这是基本上的中位数。