代码之家  ›  专栏  ›  技术社区  ›  ParisaN

streaming.streamjob错误:作业未成功

  •  0
  • ParisaN  · 技术社区  · 6 年前

    我安装了hadoop 2.9.0,有4个节点。这个 namenode resourcemanager 服务在master和 datanodes nodemanagers 在奴隶身上奔跑。现在我想运行一个python mapreduce作业。但工作不成功! 请告诉我该怎么做?

    终端作业运行日志 :

    hadoop@hadoopmaster:/usr/local/hadoop$ bin/hadoop jar share/hadoop/tools/lib/hadoop-streaming-2.9.0.jar -file mapper.py -mapper mapper.py -file reducer.py -reducer reducer.py -input /user/hadoop/* -output /user/hadoop/output
    18/06/17 04:26:28 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
    18/06/17 04:26:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    packageJobJar: [mapper.py, /tmp/hadoop-unjar3316382199020742755/] [] /tmp/streamjob4930230269569102931.jar tmpDir=null
    18/06/17 04:26:28 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.111.175:8050
    18/06/17 04:26:29 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.111.175:8050
    18/06/17 04:26:29 WARN hdfs.DataStreamer: Caught exception
    java.lang.InterruptedException
        at java.lang.Object.wait(Native Method)
        at java.lang.Thread.join(Thread.java:1249)
        at java.lang.Thread.join(Thread.java:1323)
        at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:980)
        at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:630)
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:807)
    18/06/17 04:26:29 WARN hdfs.DataStreamer: Caught exception
    java.lang.InterruptedException
        at java.lang.Object.wait(Native Method)
        at java.lang.Thread.join(Thread.java:1249)
        at java.lang.Thread.join(Thread.java:1323)
        at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:980)
        at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:630)
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:807)
    18/06/17 04:26:29 INFO mapred.FileInputFormat: Total input files to process : 4
    18/06/17 04:26:29 WARN hdfs.DataStreamer: Caught exception
    java.lang.InterruptedException
        at java.lang.Object.wait(Native Method)
        at java.lang.Thread.join(Thread.java:1249)
        at java.lang.Thread.join(Thread.java:1323)
        at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:980)
        at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:630)
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:807)
    18/06/17 04:26:29 INFO mapreduce.JobSubmitter: number of splits:4
    18/06/17 04:26:29 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
    18/06/17 04:26:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1529233655437_0004
    18/06/17 04:26:30 INFO impl.YarnClientImpl: Submitted application application_1529233655437_0004
    18/06/17 04:26:30 INFO mapreduce.Job: The url to track the job: http://hadoopmaster.png.com:8088/proxy/application_1529233655437_0004/
    18/06/17 04:26:30 INFO mapreduce.Job: Running job: job_1529233655437_0004
    
    18/06/17 04:45:12 INFO mapreduce.Job: Job job_1529233655437_0004 running in uber mode : false
    18/06/17 04:45:12 INFO mapreduce.Job:  map 0% reduce 0%
    18/06/17 04:45:12 INFO mapreduce.Job: Job job_1529233655437_0004 failed with state FAILED due to: Application application_1529233655437_0004 failed 2 times due to Error launching appattempt_1529233655437_0004_000002. Got exception: org.apache.hadoop.net.ConnectTimeoutException: Call From hadoopmaster.png.com/192.168.111.175 to hadoopslave1.png.com:40569 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=hadoopslave1.png.com/192.168.111.173:40569]; For more details see:  http://wiki.apache.org/hadoop/SocketTimeout
        at sun.reflect.GeneratedConstructorAccessor38.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:824)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:774)
        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497)
        at org.apache.hadoop.ipc.Client.call(Client.java:1439)
        at org.apache.hadoop.ipc.Client.call(Client.java:1349)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
        at com.sun.proxy.$Proxy82.startContainers(Unknown Source)
        at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
        at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
        at com.sun.proxy.$Proxy83.startContainers(Unknown Source)
        at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:122)
        at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:307)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
    Caused by: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=hadoopslave1.png.com/192.168.111.173:40569]
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:687)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:790)
        at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:411)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1554)
        at org.apache.hadoop.ipc.Client.call(Client.java:1385)
        ... 19 more
    . Failing the application.
    18/06/17 04:45:13 INFO mapreduce.Job: Counters: 0
    18/06/17 04:45:13 ERROR streaming.StreamJob: Job not successful!
    Streaming Command Failed!
    
    1 回复  |  直到 6 年前
        1
  •  0
  •   ParisaN    6 年前

    好啊。我找到了问题的原因。实际上,应该解决以下错误:

    hadoopmaster.png.com/192.168.111.175到hadoopslave1.png.com:40569 套接字超时异常时失败

    所以就这样做了:

    sudo ufw allow 40569