代码之家  ›  专栏  ›  技术社区  ›  Jämes

在集群中使用Docker:BlockManagerId;本地类不兼容

  •  0
  • Jämes  · 技术社区  · 7 年前

    我在使用Spark和Docker分发操作时遇到类型不匹配。 The tutorial 我跟踪的似乎很清楚。下面是我对Scala代码的尝试:

    package test
    
    import com.datastax.spark.connector.cql.CassandraConnector
    import org.apache.spark.{SparkConf, SparkContext}
    import readhub.sharedkernel.config.Settings
    
    object Application extends App {
        import com.datastax.spark.connector._
    
    
        val conf = new SparkConf(true)
          .setAppName("Coordinator")
          .setMaster("spark://localhost:7077")
          .set("spark.cassandra.connection.host", "valid host")
    
        val sc = new SparkContext(conf)
    
        CassandraConnector(conf).withSessionDo { session =>
          session.execute("CREATE KEYSPACE test2 WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1 }")
          session.execute("CREATE TABLE test2.words (word text PRIMARY KEY, count int)")
          session.execute("INSERT INTO test2.words(word, count) VALUES('hey', 32)")
    
          sc.cassandraTable("test2", "words")
            .map(r => r.getString("word"))
            .foreach(process)
    
        }
    
      def process(word: String): Unit = {
        // Dummy processing
        println(word)
      }
    } 
    

    import sbt.project
    
    val sparkSql = "org.apache.spark" %% "spark-sql" % "2.3.0" % "provided"
    val sparkCassandraConnector = "com.datastax.spark" %% "spark-cassandra-connector" % "2.3.0" % "provided"
    
    lazy val commonSettings = Seq(
      version := "0.1",
      scalaVersion := "2.11.12",
      organization := "ch.heig-vd"
    )
    
    lazy val root = (project in file("."))
      .settings(
        commonSettings,
        name := "Root"
      )
      .aggregate(
        coordinator
      )
    
    lazy val coordinator = project
      .settings(
        commonSettings,
        name := "Coordinator",
        libraryDependencies ++= Seq(
          sparkSql,
          sparkCassandraConnector
        )
      )
    

    Dockerfile已从 this image

    FROM phusion/baseimage:0.9.22
    
    ENV SPARK_VERSION 2.3.0
    ENV SPARK_INSTALL /usr/local
    ENV SPARK_HOME $SPARK_INSTALL/spark
    ENV SPARK_ROLE master
    ENV HADOOP_VERSION 2.7
    ENV SPARK_MASTER_PORT 7077
    ENV PYSPARK_PYTHON python3
    ENV DOCKERIZE_VERSION v0.2.0
    
    RUN apt-get update && \
        apt-get install -y openjdk-8-jdk autossh python3-pip && \
        apt-get clean && \
        rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
    
    ##### INSTALL DOCKERIZE
    RUN curl -L -O https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz && \
        tar -C /usr/local/bin -xzvf dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz && \
        rm -rf dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz
    
    ##### INSTALL APACHE SPARK WITH HDFS
    RUN curl -s http://mirror.synyx.de/apache/spark/spark-$SPARK_VERSION/spark-$SPARK_VERSION-bin-hadoop$HADOOP_VERSION.tgz | tar -xz -C $SPARK_INSTALL && \
        cd $SPARK_INSTALL && ln -s spark-$SPARK_VERSION-bin-hadoop$HADOOP_VERSION spark
    
    WORKDIR $SPARK_HOME
    
    ##### ADD Scripts
    RUN mkdir /etc/service/spark
    ADD runit/spark.sh /etc/service/spark/run
    RUN chmod +x /etc/service/**/*
    
    EXPOSE 4040 6066 7077 7078 8080 8081 8888
    
    VOLUME ["$SPARK_HOME/logs"]
    
    CMD ["/sbin/my_init"]
    

    码头工人-合成.yml也很简单:

    version: "3"
    
    services:
      master:
        build: birgerk-apache-spark
    
        ports:
          - "7077:7077"
          - "8080:8080"
    
      slave:
        build: birgerk-apache-spark
        environment:
          - SPARK_ROLE=slave
          - SPARK_MASTER=master
        depends_on:
          - master
    

    birgerk-apache-spark

    最后,我用以下方法粘合所有东西:

    sbt coordinator/assembly
    

    去创造一个肥罐子

    spark-submit --class test.Application --packages com.datastax.spark:spark-cassandra-connector_2.11:2.3.0 --master spark://localhost:7077 ReadHub\ Coordinator-assembly-0.1.jar
    

    spark-submit :

    传输请求错误stHandler:199-调用时出错 RpcHandler在RPC id 7068633004064450609上的receive() java.io.InvalidClassException: stream classdesc serialVersionUID=6155820641931972169,本地类 在java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:687) 在java.io.ObjectInputStream.readClassDesc(java:1745) 在java.io.ObjectInputStream.ReadOrdinary对象(java:2033) 在java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278) [..]

    从我的角度来看,Dockerfile正确地下载了Spark的相应版本,这个版本可以在我的构建.sbt.

    1 回复  |  直到 7 年前
        1
  •  0
  •   fracca    6 年前

    spark 2.3.3和spark 2.3.0版本不匹配。