代码之家  ›  专栏  ›  技术社区  ›  Srinivas

对spark streaming中的以下错误进行故障排除

  •  1
  • Srinivas  · 技术社区  · 6 年前

    我在尝试向HDFS写入数据时遇到了这个错误。这工作做得很好,我得到这个错误。所以很明显存在数据问题。

    18/09/15 04:13:43 ERROR JobScheduler: Error running job streaming job 1536977640000 ms.0
    java.util.NoSuchElementException: None.get
        at scala.None$.get(Option.scala:347)
        at scala.None$.get(Option.scala:345)
        at org.apache.spark.sql.execution.command.DataWritingCommand$class.metrics(DataWritingCommand.scala:49)
        at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.metrics$lzycompute(InsertIntoHadoopFsRelationCommand.scala:46)
        at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.metrics(InsertIntoHadoopFsRelationCommand.scala:46)
        at org.apache.spark.sql.execution.command.DataWritingCommandExec.metrics$lzycompute(commands.scala:100)
        at org.apache.spark.sql.execution.command.DataWritingCommandExec.metrics(commands.scala:100)
        at org.apache.spark.sql.execution.SparkPlanInfo$.fromSparkPlan(SparkPlanInfo.scala:58)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
        at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654)
        at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:225)
    

    这是否意味着我的输出在数据流中没有任何内容?下面是我用来将DStream写入HDFS的代码

    outputDStream.repartition(100).foreachRDD((rdd: RDD[Transaction], time: SparkTime) => {
          val df = rdd.toDF
    
          val dfWithTimestamp = df.select("*").withColumn("current_timestamp",current_timestamp())
    
          dfWithTimestamp.write
              .mode(SaveMode.Overwrite)
              .save(s"${outputPath}")
    
        })
    
    0 回复  |  直到 6 年前
    推荐文章