代码之家  ›  专栏  ›  技术社区  ›  Mittenchops

sqlite中非常慢的多表联接

  •  0
  • Mittenchops  · 技术社区  · 6 年前

    SELECT count(*) FROM PanelsMeta
    INNER JOIN Publishers ON PanelsMeta.publisherid = Publishers.id
    INNER JOIN Geographies ON Geographies.geo = Publishers.geo;
    

    使用查询分析器,我看到查询被编入索引:

    QUERY PLAN
    |--SCAN TABLE PanelsMeta USING COVERING INDEX PanPubId
    |--SEARCH TABLE Publishers USING INTEGER PRIMARY KEY (rowid=?)
    `--SEARCH TABLE Geographies USING COVERING INDEX geos (geo=?)
    

    表的大小如下:

    sqlite> select count(*) from Publishers;
    55
    sqlite> select count(*) from PanelsMeta;
    2948875
    sqlite> select count(*) from Geographies;
    37323
    

    我做错什么了?

    我尝试的变体会产生相同的查询计划,而且速度也慢了几十分钟:

    SELECT count(*) FROM Geographies
    LEFT JOIN Publishers ON Publishers.geo = Geographies.geo 
    LEFT JOIN PanelsMeta ON PanelsMeta.publisherid = Publishers.id;
    
    # QUERY PLAN
    # |--SCAN TABLE Geographies USING COVERING INDEX geos
    # |--SEARCH TABLE Publishers USING COVERING INDEX PubGeo (geo=?)
    # `--SEARCH TABLE PanelsMeta USING COVERING INDEX PanPubId (publisherid=?)
    
    SELECT count(*) FROM Publishers
    LEFT JOIN PanelsMeta ON PanelsMeta.publisherid = Publishers.id
    LEFT JOIN Geographies ON Geographies.geo = Publishers.geo;
    
    # QUERY PLAN
    # |--SCAN TABLE Publishers USING COVERING INDEX PubGeo
    # |--SEARCH TABLE PanelsMeta USING COVERING INDEX PanPubId (publisherid=?)
    # `--SEARCH TABLE Geographies USING COVERING INDEX geos (geo=?)
    

    架构信息如下:

    CREATE TABLE PanelsMeta(
      id INTEGER PRIMARY KEY AUTOINCREMENT,
      f1 TEXT, 
      f2 TEXT, 
      f3 TEXT, 
      f4 DATETIME,
      f5 DATETIME,
      f6 TEXT, 
      f7 TEXT,
      publisherid INTEGER,
      FOREIGN KEY(publisherid) REFERENCES Publishers(id) ON DELETE CASCADE ON UPDATE CASCADE
    );
    
    CREATE INDEX ids ON PanelsMeta (id);
    CREATE INDEX pp1 ON PanelsMeta (publisherid);
    CREATE INDEX pp2 ON PanelsMeta (f1);
    CREATE INDEX pp3 ON PanelsMeta (f1,publisherid);
    

    CREATE TABLE Publishers(
      id INTEGER PRIMARY KEY AUTOINCREMENT,
      geo TEXT,
      f3 TEXT NOT NULL, 
      f4 TEXT NOT NULL,
      f5 TEXT,
      f6 TEXT
    );
    
    CREATE INDEX zf3 ON Publishers (f3);
    CREATE INDEX zgeo ON Publishers (Geo);
    CREATE INDEX zf6 ON Publishers (f6);
    CREATE INDEX zid ON Publishers (id);
    CREATE INDEX zf3g ON Publishers (f3,geo);
    CREATE INDEX zf3gf6 ON Publishers (f3,geo,f6);
    

    CREATE TABLE Geographies(
      id INTEGER PRIMARY KEY AUTOINCREMENT,
      geo TEXT NOT NULL,
      f3 TEXT NOT NULL,
      f4 TEXT,
      f5 DATETIME,
      f6 TEXT,
      f7 TEXT,
      f7 JSON DEFAULT '{}',
      f8 TEXT
    );
    
    CREATE INDEX g ON Geographies (geo);
    CREATE INDEX gf3 ON Geographies (f3);
    
    0 回复  |  直到 6 年前
        1
  •  1
  •   Tim Werner    4 年前

    然而,我的完整数据集是18gb和大约1100万行

    我通过将所有数据放在一个表中,然后使用where-in语句来解决这个问题。这很奇怪,但速度要快得多(大约1秒而不是几分钟)