代码之家  ›  专栏  ›  技术社区  ›  UKDataGeek

使用谷歌大查询构建基本漏斗

  •  -1
  • UKDataGeek  · 技术社区  · 7 年前

    我注意到google bigquery有很多google analytics的用户,但是文档非常有限。是否可以帮助生成一个简单的漏斗图,显示访问/pagea然后/pageb然后/pagec的用户

    我看到过很多不同的方法——我不清楚“正确”的方法是什么。

    2 回复  |  直到 7 年前
        1
  •  1
  •   Martin Weitzmann    7 年前

    您可以先将用户点击数与数组concat_agg()连接起来,然后根据新的用户范围表进行计算。当然,这在很大程度上取决于你所选择的时间范围。

    例如,这里有来自谷歌的虚拟数据:

    #standardSQL
    WITH arrAgg AS (
      SELECT
        fullvisitorid,
        -- concatenate arrays over multiple sessions
        ARRAY_CONCAT_AGG(hits ORDER BY visitstarttime ASC) userHits
      FROM
        `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`
      GROUP BY 1
    )
    , journey AS (
      SELECT 
        fullvisitorId,
        -- get a proper running index with combination of unnest and offset of aggregated hits array
        ARRAY( (SELECT AS STRUCT index+1 as hitNumber, page FROM UNNEST(userHits) WITH OFFSET AS index)) as hits
      FROM arrAgg
    )
    
    SELECT * FROM journey
    

    当你运行这个,你可以看到新的“原材料”。在第一步中,我连接点击量,在第二步中,我为页面创建一个适当的索引,并将每一个页面放回一个“点击量”数组中。

    您可以使用交叉联接和比较页面的步骤和顺序来构建用户旅程:

    #standardSQL
    WITH arrAgg AS (
      SELECT
        fullvisitorid,
        SUM(totals.visits) sessions,
        -- concatenate arrays over multiple sessions
        ARRAY_CONCAT_AGG(hits ORDER BY visitstarttime ASC) userHits
      FROM
        `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`
      GROUP BY 1
    )
    , journey AS (
      SELECT 
        fullvisitorId,
        sessions,
        -- get a proper running index with combination of unnest and offset of aggregated hits array
        ARRAY( (SELECT AS STRUCT index+1 as hitNumber, page FROM UNNEST(userHits) WITH OFFSET AS index WHERE type='PAGE')) as hits
      FROM arrAgg
    )
    -- funnel: homepage: /, login: /login.html, basket: /basket.html, confirm: /confirm.html
    SELECT 
      SUM(sessions) allSessions,
      COUNT(1) allUsers,
      -- check if any page was home page
      SUM( (SELECT IF( LOGICAL_OR(page.pagePath='/'), 1, 0) FROM j.hits) ) step1_home,
      -- cross join hits array with itself: combination of all pages with all pages: any of those combinations our two pages? came home before login?: if yes for any given amount add up 1
      SUM( (SELECT IF( LOGICAL_OR(a.page.pagePath='/' AND b.page.pagePath='/login.html' AND a.hitNumber < b.hitNumber) ,1, 0 ) FROM j.hits a CROSS JOIN j.hits b) ) step2_login,
      -- extend cross join principle to a third page
      SUM( (SELECT IF( LOGICAL_OR(
          a.page.pagePath='/' AND b.page.pagePath='/login.html' AND c.page.pagePath='/basket.html' AND
          a.hitNumber < b.hitNumber AND b.hitNumber < c.hitNumber 
          ) ,1, 0 ) FROM j.hits a CROSS JOIN j.hits b CROSS JOIN j.hits c) ) step3_basket,
      -- extend cross join principle to a fourth page
      SUM( (SELECT IF( LOGICAL_OR(
          a.page.pagePath='/' AND b.page.pagePath='/login.html' AND c.page.pagePath='/basket.html' AND d.page.pagePath='/confirm.html' AND
          a.hitNumber < b.hitNumber AND b.hitNumber < c.hitNumber AND c.hitNumber < d.hitNumber
          ) ,1, 0 ) FROM j.hits a CROSS JOIN j.hits b CROSS JOIN j.hits c CROSS JOIN j.hits d) ) step4_confirm
    FROM journey j
    

    由于所有操作都使用数组上的子查询,因此由于并行化,它应该能够很好地进行扩展。 请在使用前进行测试-我没有;)但它应该指向正确的方向。

        2
  •  0
  •   citrus    7 年前

    请在此处检查: https://online-behavior.com/analytics/funnel-analysis

    或者,如果您想手工操作:

    1. 选择登录到起始页的所有会话(记录最小命中数)
    2. 使用“在下一页上选择命中数大于起始页中命中数的所有会话”(匹配会话ID)进行左连接
    3. 重复步骤2,直到漏斗完成
    4. 按页名称分组的所有会话计数为聚合