我有一个dataframe:yearDF,包含以下列: name, id_number, location, source_system_name, period_year
name, id_number, location, source_system_name, period_year
如果我想基于列重新划分数据帧,我会:
yearDF.repartition('source_system_name')
val partition_columns = "source_system_name,period_year"
我试着这样做:
val dataDFPart = yearDF.repartition(col(${prtn_String_columns}))
cannot resolve the symbol $
我是否可以重新划分数据帧: yearDF partition_columns
yearDF
partition_columns
Scala/Spark中的重分区函数有三种实现方式:
def repartition(partitionExprs: Column*): Dataset[T] def repartition(numPartitions: Int, partitionExprs: Column*): Dataset[T] def repartition(numPartitions: Int): Dataset[T]
val columns = partition_columns.split(",").map(x => col(x)) yearDF.repartition(columns: _*)
另一种方法是,逐个调用每个列:
yearDF.repartition(col("source_system_name"), col("period_year"))