df.show()
root
|-- context_id: long (nullable = true)
|-- data1: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- k: struct (nullable = false)
| | | |-- v: string (nullable = true)
| | | |-- t: string (nullable = false)
| | |-- resourcename: string (nullable = true)
| | |-- criticity: string (nullable = true)
| | |-- v: string (nullable = true)
| | |-- vn: double (nullable = true)
|-- data2: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- k: struct (nullable = false)
| | | |-- v: string (nullable = true)
| | | |-- t: string (nullable = false)
| | |-- resourcename: string (nullable = true)
| | |-- criticity: string (nullable = true)
| | |-- v: string (nullable = true)
| | |-- vn: double (nullable = true)
我创造
udf
concat tow数组和我提供了结果的模式
val schema=df.select("data1").schema
val concatArray = udf ({ (x: Seq[Row], y: Seq[Row]) => x ++ y}, schema)
当我应用自定义项时,我会得到这个错误
org.apache.spark.SparkException: Failed to execute user defined function($anonfun$11: (array<struct<k:struct<v:string,t:string>,resourcename:string,criticity:string,v:string,vn:double>>, array<struct<k:struct<v:string,t:string>,resourcename:string,criticity:string,v:string,vn:double>>) => struct<data1:array<struct<k:struct<v:string,t:string>,resourcename:string,criticity:string,v:string,vn:double>>>)
有什么建议吗