代码之家  ›  专栏  ›  技术社区  ›  Shankar Panda

Pyspark:如何在通过配置单元上下文执行时传递sql脚本中的参数

  •  0
  • Shankar Panda  · 技术社区  · 6 年前

    for line in tcp.collect():
    hive_context.sql("SELECT 'zip' as Variable_name,percentile(zip, 0.25) as Q1, percentile(zip, 0.75) as Q3 FROM df_tab").show()  -- Zip should be replaced by variable line
    
    I tried to do something like this as well, but it dint work
    query="SELECT {d_line} as Variable_name, percentile({line}, 0.25) as Q1, percentile({line}, 0.75) as Q3 FROM df_tab".format(d_line=line) --this gives me output as 
    

    从df_tab中选择zip作为变量名,percentile(zip,0.25)作为Q1,percentile(zip,0.75)作为Q3——这里的zip必须以单引号形式出现

    预期的输出查询:从df_tab中选择'zip'作为变量名,percentile(zip,0.25)作为Q1,percentile(zip,0.75)作为Q3

    1 回复  |  直到 6 年前
        1
  •  0
  •   Shankar Panda    6 年前
    query="SELECT {d_name} as Variable_name, percentile({f_name}, 0.25) as Q1, percentile({f_name}, 0.75) as Q3 FROM df_tab GROUP BY {f_name}".format(f_name=line, d_name="'"+str(line)+"'")