代码之家 › 专栏 › 技术社区 › Georg Heiler

spark 2.2无法处理聚合表达式中的映射列

distinct apache-spark-sql dictionary apache-spark

0

Georg Heiler · 技术社区 · 7 年前

我怎么能 GROUP BY 或使用 DISTINCT 有地图的复杂类型列?:

case class Foo(id:Int, stuff:Map[String, Int])
  val xx = Seq(Foo(1, Map("first" -> 1, "second"->2)), Foo(1, Map("first" -> 1, "second"->2)), Foo(3, Map("fourth" -> 4, "fifth"->5))).toDF
  xx.distinct.show
  xx.groupBy("id", "stuff").count.show

错误是

expression `stuff` cannot be used as a grouping expression because its data type map<string,int> is not an orderable data type

似乎与 https://mapr.com/support/s/article/Spark-SQL-queries-on-Map-column-fails-with-exception-Cannot-have-map-type-columns-in-DataFrame ?

可能在spark 2.4中修复了?

不过,我目前只限于2.2。有2.2的解决方案吗?

spark dynamically create struct/json per group ).

编辑

手动序列化为JSON是一种解决方法(但相当笨拙)
我还可以使用自定义case类数组,而不是使用map type列。 Seq[Foo]; case class Foo(column:String, column_value:String, value:String) . 这允许但对于任何第三方来说,格式似乎都不太直观

0 回复 | 直到 7 年前