我怎么能
GROUP BY
或使用
DISTINCT
有地图的复杂类型列?:
case class Foo(id:Int, stuff:Map[String, Int])
val xx = Seq(Foo(1, Map("first" -> 1, "second"->2)), Foo(1, Map("first" -> 1, "second"->2)), Foo(3, Map("fourth" -> 4, "fifth"->5))).toDF
xx.distinct.show
xx.groupBy("id", "stuff").count.show
错误是
expression `stuff` cannot be used as a grouping expression because its data type map<string,int> is not an orderable data type
似乎与
https://mapr.com/support/s/article/Spark-SQL-queries-on-Map-column-fails-with-exception-Cannot-have-map-type-columns-in-DataFrame
?
可能在spark 2.4中修复了?
不过,我目前只限于2.2。有2.2的解决方案吗?
spark dynamically create struct/json per group
).
编辑
-
手动序列化为JSON是一种解决方法(但相当笨拙)
-
我还可以使用自定义case类数组,而不是使用map type列。
Seq[Foo]; case class Foo(column:String, column_value:String, value:String)
. 这允许
但对于任何第三方来说,格式似乎都不太直观