代码之家  ›  专栏  ›  技术社区  ›  Kyle Bridenstine

AirflowException:芹菜命令失败-记录的主机名与此实例的主机名不匹配

  •  1
  • Kyle Bridenstine  · 技术社区  · 7 年前

    我在两个AWS EC2实例上运行的集群环境中运行airlow。一个给主人,一个给工人。不过,工作节点在运行“$aiflow worker”时会定期抛出此错误:

    [2018-08-09 16:15:43,553] {jobs.py:2574} WARNING - The recorded hostname ip-1.2.3.4 does not match this instance's hostname ip-1.2.3.4.eco.tanonprod.comanyname.io
    Traceback (most recent call last):
      File "/usr/bin/airflow", line 27, in <module>
        args.func(args)
      File "/usr/local/lib/python3.6/site-packages/airflow/bin/cli.py", line 387, in run
        run_job.run()
      File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 198, in run
        self._execute()
      File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 2527, in _execute
        self.heartbeat()
      File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 182, in heartbeat
        self.heartbeat_callback(session=session)
      File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 50, in wrapper
        result = func(*args, **kwargs)
      File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 2575, in heartbeat_callback
        raise AirflowException("Hostname of job runner does not match")
    airflow.exceptions.AirflowException: Hostname of job runner does not match
    [2018-08-09 16:15:43,671] {celery_executor.py:54} ERROR - Command 'airflow run arl_source_emr_test_dag runEmrStep2WaiterTask 2018-08-07T00:00:00 --local -sd /var/lib/airflow/dags/arl_source_emr_test_dag.py' returned non-zero exit status 1.
    [2018-08-09 16:15:43,681: ERROR/ForkPoolWorker-30] Task airflow.executors.celery_executor.execute_command[875a4da9-582e-4c10-92aa-5407f3b46d5f] raised unexpected: AirflowException('Celery command failed',)
    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 52, in execute_command
        subprocess.check_call(command, shell=True)
      File "/usr/lib64/python3.6/subprocess.py", line 291, in check_call
        raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command 'airflow run arl_source_emr_test_dag runEmrStep2WaiterTask 2018-08-07T00:00:00 --local -sd /var/lib/airflow/dags/arl_source_emr_test_dag.py' returned non-zero exit status 1.
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/usr/lib/python3.6/dist-packages/celery/app/trace.py", line 382, in trace_task
        R = retval = fun(*args, **kwargs)
      File "/usr/lib/python3.6/dist-packages/celery/app/trace.py", line 641, in __protected_call__
        return self.run(*args, **kwargs)
      File "/usr/local/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 55, in execute_command
        raise AirflowException('Celery command failed')
    airflow.exceptions.AirflowException: Celery command failed
    

    当这个错误发生时,任务在气流中被标记为失败,因此当任务中没有实际出错时,我的DAG将失败。

    我使用Redis作为队列,postgreSQL作为元数据库。两者都是外部的AWS服务。我正在我的公司环境中运行所有这些,这就是为什么服务器的全名是 ip-1.2.3.4.eco.tanonprod.comanyname.io . 看起来它想要这个全名,但是我不知道我需要在哪里修改这个值,这样它就可以 ip-1.2.3.4.eco.tanonprod.comanyname.io 而不是仅仅 ip-1.2.3.4 .

    这个问题真正奇怪的是它并不总是发生。 我运行DAG时,似乎每隔一段时间就随机发生一次。它也偶尔出现在我所有的DAG上,所以它不只是一个DAG。我觉得很奇怪,虽然它是零星的,因为这意味着其他任务运行处理的IP地址,无论这是什么只是罚款。

    注: 出于隐私原因,我已将实际IP地址更改为1.2.3.4。

    回答:

    https://github.com/apache/incubator-airflow/pull/2484

    这正是我遇到的问题,AWS EC2实例上的其他airlow用户也遇到了这个问题。

    1 回复  |  直到 7 年前
        1
  •  1
  •   cwurtz    7 年前

    主机名在任务实例运行时设置,并设置为 self.hostname = socket.getfqdn() ,其中socket是python包 import socket .

    触发此错误的比较是:

    fqdn = socket.getfqdn()
    if fqdn != ti.hostname:
        logging.warning("The recorded hostname {ti.hostname} "
            "does not match this instance's hostname "
            "{fqdn}".format(**locals()))
        raise AirflowException("Hostname of job runner does not match")
    

    当工作进程运行时,ec2实例上的主机名似乎正在发生变化。也许可以尝试手动设置主机名,如下所述 https://forums.aws.amazon.com/thread.jspa?threadID=246906 看看能不能坚持。

    推荐文章