代码之家  ›  专栏  ›  技术社区  ›  Averell

Scala-Spark结构上的sql行模式匹配

  •  4
  • Averell  · 技术社区  · 8 年前

    我试图在Dataframe映射函数中进行模式匹配—将一行与具有嵌套Case类的行模式进行匹配。此数据帧是联接的结果,其架构如下所示。它有一些基本类型的列和两个复合列:

    case class MyList(values: Seq[Integer])
    case class MyItem(key1: String, key2: String, field1: Integer, group1: MyList, group2: MyList, field2: Integer)
    val myLine1 = new MyItem ("MyKey01", "MyKey02", 1, new MyList(Seq(1)), new MyList(Seq(2)), 2)
    val myLine2 = new MyItem ("YourKey01", "YourKey02", 2, new MyList(Seq(2,3)), new MyList(Seq(4,5)), 20)
    val dfRaw = Seq(myLine1, myLine2).toDF
    dfRaw.printSchema
    dfRaw.show
    val df2 = dfRaw.map(r => r match {
        case Row(key1: String, key2: String, field1: Integer, group1: MyList, group2: MyList, field2: Integer) => "Matched"
        case _ => "Un matched"
    })
    df2.show
    

    我的问题是,在映射函数之后,我得到的只是“不匹配”:

    root
     |-- key1: string (nullable = true)
     |-- key2: string (nullable = true)
     |-- field1: integer (nullable = true)
     |-- group1: struct (nullable = true)
     |    |-- values: array (nullable = true)
     |    |    |-- element: integer (containsNull = true)
     |-- group2: struct (nullable = true)
     |    |-- values: array (nullable = true)
     |    |    |-- element: integer (containsNull = true)
     |-- field2: integer (nullable = true)
    +---------+---------+------+--------------------+--------------------+------+
    |     key1|     key2|field1|              group1|              group2|field2|
    +---------+---------+------+--------------------+--------------------+------+
    |  MyKey01|  MyKey02|     1|   [WrappedArray(1)]|   [WrappedArray(2)]|     2|
    |YourKey01|YourKey02|     2|[WrappedArray(2, 3)]|[WrappedArray(4, 5)]|    20|
    +---------+---------+------+--------------------+--------------------+------+
    df2: org.apache.spark.sql.Dataset[String] = [value: string]
    +----------+
    |     value|
    +----------+
    |Un matched|
    |Un matched|
    +----------+
    

    如果忽略case分支中的两个struct列(替换 组1:MyList,组2:MyList 具有 _,则_ ,那么它就起作用了

    case Row(key1: String, key2: String, field1: Integer, group1: MyList, group2: MyList, field2: Integer) => "Matched"
    

    你能帮助我如何在那个案例类中进行模式匹配吗? 谢谢

    1 回复  |  直到 8 年前
        1
  •  1
  •   Ramesh Maharjan    8 年前

    struct 列被视为 org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema 火花中

    所以你必须 定义匹配大小写

    import org.apache.spark.sql.catalyst.expressions._
    val df2 = dfRaw.map(r => r match {
        case Row(key1: String, key2: String, field1: Integer, group1: GenericRowWithSchema, group2: GenericRowWithSchema, field2: Integer) => "Matched"
        case _ => "Un matched"
    })
    

    使用通配符(\u1)定义匹配大小写有效 因为 Scala编译器隐式计算 组织。阿帕奇。火花sql。催化剂表达式。GenericRowWithSchema 作为数据类型

    决定性的 以下情况也适用 与外卡一样,由于内隐评估

    case Row(key1: String, key2: String, field1: Integer, group1, group2, field2: Integer) => "Matched"
    
    推荐文章