代码之家  ›  专栏  ›  技术社区  ›  Masterbuilder

将函数应用于scala的数据帧列

  •  -2
  • Masterbuilder  · 技术社区  · 6 年前

    val df = sc.parallelize(
      Seq(("id1", "B", "c","d"), ("id2", "e", "d","k"),("id3", "e", "m","n"))).toDF("id", "dat1", "dat2","dat3")
    df.show
    
    +---+----+----+----+
    | id|dat1|dat2|dat3|
    +---+----+----+----+
    |id1|   B|   c|   d|
    |id2|   e|   d|   k|
    |id3|   e|   m|   n|
    +---+----+----+----+
    
    df.select(df.columns.slice(1,df.columns.size).map(c => upper(col(c)).alias(c)): _*).show
    
    ----+----+----+
    |dat1|dat2|dat3|
    +----+----+----+
    |   B|   C|   D|
    |   E|   D|   K|
    |   E|   M|   N|
    +----+----+----+
    

    预期产量

    -----+----+----+
    id|dat1|dat2|dat3|
    -+----+----+----+
    |id1|   B|   C|   D|
    |id2|   E|   D|   K|
    |id3|   E|   M|   N|
    -+----+----+----+
    
    1 回复  |  直到 6 年前
        1
  •  3
  •   akuiper    6 年前

    只需预先准备 id

    df.select(
        col("id") +: df.columns.tail.map(c => upper(col(c)).alias(c)): _*
    ).show
    +---+----+----+----+
    | id|dat1|dat2|dat3|
    +---+----+----+----+
    |id1|   B|   C|   D|
    |id2|   E|   D|   K|
    |id3|   E|   M|   N|
    +---+----+----+----+