代码之家  ›  专栏  ›  技术社区  ›  tree em

如何从逗号分隔的字符串生成映射行?

  •  0
  • tree em  · 技术社区  · 6 年前
        var clearedLine = ""
        var dict = collection.mutable.Map[String, String]()
        val rdd =  BufferedSource.map(line=> ({
              if (!line.endsWith(", ")) {
                clearedLine = line+", "
              } else{
                clearedLine = line.trim
              }
          clearedLine.split(",")(0).trim->clearedLine.split(",")(1).trim
          }
          //,clearedLine.split(",")(1).trim->clearedLine.split(",")(0).trim
        )
          //dict +=clearedLine.split(",")(0).trim.replace(" TO ","->")
        )
    
        for ((k,v) <- rdd) printf("key: %s, value: %s\n", k, v)
    

    输出:

    key: EQU EB.AR.DESCRIPT TO 1, value: EB.AR.ASSET.CLASS TO 2
    key: EB.AR.CURRENCY TO 3, value: EB.AR.ORIGINAL.VALUE TO 4
    

    我想用“to”来分割,然后制作单曲 dict 键->值,请帮忙

       key: 1,  value: EQU EB.AR.DESCRIPT 
       key: 2   value: EB.AR.ASSET.CLASS
       key: 3,  value: EB.AR.CURRENCY
       key: 4,  value: EB.AR.ORIGINAL.VALUE
    
    1 回复  |  直到 6 年前
        1
  •  2
  •   stack0114106    6 年前

    假设您的输入是如下所示的行

    EQU EB.AR.DESCRIPT TO 1,EB.AR.ASSET.CLASS TO 2
    EB.AR.CURRENCY TO 3, EB.AR.ORIGINAL.VALUE TO 4
    

    scala> val df = Seq(("EQU EB.AR.DESCRIPT TO 1,EB.AR.ASSET.CLASS TO 2"),("EB.AR.CURRENCY TO 3, EB.AR.ORIGINAL.VALUE TO 4")).toDF("a")
    df: org.apache.spark.sql.DataFrame = [a: string]
    
    scala> df.show(false)
    +----------------------------------------------+
    |a                                             |
    +----------------------------------------------+
    |EQU EB.AR.DESCRIPT TO 1,EB.AR.ASSET.CLASS TO 2|
    |EB.AR.CURRENCY TO 3, EB.AR.ORIGINAL.VALUE TO 4|
    +----------------------------------------------+
    
    
    scala> val df2 = df.select(split($"a",",").getItem(0).as("a1"),split($"a",",").getItem(1).as("a2"))
    df2: org.apache.spark.sql.DataFrame = [a1: string, a2: string]
    
    scala> df2.show(false)
    +-----------------------+--------------------------+
    |a1                     |a2                        |
    +-----------------------+--------------------------+
    |EQU EB.AR.DESCRIPT TO 1|EB.AR.ASSET.CLASS TO 2    |
    |EB.AR.CURRENCY TO 3    | EB.AR.ORIGINAL.VALUE TO 4|
    +-----------------------+--------------------------+
    
    
    scala> val df3 = df2.flatMap( r => { (0 until r.size).map( i=> r.getString(i) ) })
    df3: org.apache.spark.sql.Dataset[String] = [value: string]
    
    scala> df3.show(false)
    +--------------------------+
    |value                     |
    +--------------------------+
    |EQU EB.AR.DESCRIPT TO 1   |
    |EB.AR.ASSET.CLASS TO 2    |
    |EB.AR.CURRENCY TO 3       |
    | EB.AR.ORIGINAL.VALUE TO 4|
    +--------------------------+
    
    
    scala> df3.select(regexp_extract($"value",""" TO (\d+)\s*$""",1).as("key"),regexp_replace($"value",""" TO (\d+)\s*$""","").as("value")).show(false)
    +---+---------------------+
    |key|value                |
    +---+---------------------+
    |1  |EQU EB.AR.DESCRIPT   |
    |2  |EB.AR.ASSET.CLASS    |
    |3  |EB.AR.CURRENCY       |
    |4  | EB.AR.ORIGINAL.VALUE|
    +---+---------------------+
    

    如果要将它们作为“映射”列,则

    scala> val df4 = df3.select(regexp_extract($"value",""" TO (\d+)\s*$""",1).as("key"),regexp_replace($"value",""" TO (\d+)\s*$""","").as("value")).select(map($"key",$"value").as("kv"))
    df4: org.apache.spark.sql.DataFrame = [kv: map<string,string>]
    
    scala> df4.show(false)
    +----------------------------+
    |kv                          |
    +----------------------------+
    |[1 -> EQU EB.AR.DESCRIPT]   |
    |[2 -> EB.AR.ASSET.CLASS]    |
    |[3 -> EB.AR.CURRENCY]       |
    |[4 ->  EB.AR.ORIGINAL.VALUE]|
    +----------------------------+
    
    
    scala> df4.printSchema
    root
     |-- kv: map (nullable = false)
     |    |-- key: string
     |    |-- value: string (valueContainsNull = true)
    
    
    scala>