代码之家  ›  专栏  ›  技术社区  ›  dark horse

气流-尝试执行一组python函数

  •  0
  • dark horse  · 技术社区  · 8 年前

    我正在尝试执行一个包含两个python函数的airlow脚本。这些函数基本上查询数据库并执行很少的任务。我尝试执行这是气流,这样我就可以单独监视这些功能。下面是我试图执行的代码,并得到以下错误

    子任务:名称错误:未定义名称“task_instance”

    ## Third party Library Imports
    
    import psycopg2
    import airflow
    from airflow import DAG
    from airflow.operators.python_operator import PythonOperator
    from datetime import datetime, timedelta
    from sqlalchemy import create_engine
    import io
    
    
    # Following are defaults which can be overridden later on
    default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2018, 1, 23, 12),
    'email': ['airflow@airflow.com'],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
    }
    
    dag = DAG('sample_dag', default_args=default_args, catchup=False, schedule_interval="@once")
    
    
    #######################
    ## Login to DB
    
    
    def db_log(**kwargs):
        global db_con
        try:
        db_con = psycopg2.connect(
    " dbname = 'name' user = 'user' password = 'pass' host = 'host' port = 'port' sslmode = 'require' ")
        except:
            print("Connection Failed.")
            print('Connected successfully')
            task_instance = kwargs['task_instance']
            task_instance.xcom_push(value="db_con", key="db_log")
            return (db_con)
    
    def insert_data(**kwargs):
        v1 = task_instance.xcom_pull(key="db_con", task_ids='db_log')
        return (v1)
        cur = db_con.cursor()
        cur.execute("""insert into tbl_1 select id,bill_no,status from tbl_2 limit 2;""")
    
    #def job_run():
    #    db_log()
    #    insert_data()
    
    
    ##########################################
    
    t1 = PythonOperator(
    task_id='Connect',
    python_callable=db_log,provide_context=True,
    dag=dag)
    
    t2 = PythonOperator(
    task_id='Query',
    python_callable=insert_data,provide_context=True,
    dag=dag)
    
    
    t1 >> t2
    

    有人能帮忙吗?谢谢。。

    更新1:

    遇到错误

    AttributeError: 'NoneType' object has no attribute 'execute'
    

    指向上面代码的最后一行

    cur.execute("""insert into tbl_1 select id,bill_no,status from tbl_2 limit 2;""")
    

    完整代码

    完整代码:

    ## Third party Library Imports
    import pandas as pd
    import psycopg2
    import airflow
    from airflow import DAG
    from airflow.operators.python_operator import PythonOperator
    from airflow.operators.bash_operator import BashOperator
    from datetime import datetime, timedelta
    from sqlalchemy import create_engine
    import io
    
    # Following are defaults which can be overridden later on
    default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2018, 5, 29, 12),
    'email': ['airflow@airflow.com']
    }
    
    dag = DAG('sample1', default_args=default_args)
    
    ## Login to DB
    
    def db_log(**kwargs):
      global db_con
      try:
        db_con = psycopg2.connect(
           " dbname = 'name' user = 'user' password = 'pass' host = 'host' port = '5439'")
      except:
        print("I am unable to connect")
        print('Connection Task Complete')
        task_instance = kwargs['task_instance']
        task_instance.xcom_push(key="dwh_connection" , value = "dwh_connection")
        return (dwh_connection)
    
    
    
    t1 = PythonOperator(
      task_id='DWH_Connect',
      python_callable=data_warehouse_login,provide_context=True,
      dag=dag)
    
    #######################
    
    def insert_data(**kwargs):
      task_instance = kwargs['task_instance']
      db_con_xcom = task_instance.xcom_pull(key="dwh_connection", task_ids='DWH_Connect')
      cur = db_con_xcom
      cur.execute("""insert into tbl_1 select limit 2 """)
    
    
    ##########################################
    
    t2 = PythonOperator(
      task_id='DWH_Connect1',
      python_callable=insert_data,provide_context=True,dag=dag)
    
    t1 >> t2
    
    2 回复  |  直到 8 年前
        1
  •  1
  •   tobi6    8 年前

    既然问题越来越大,我想应该再加一个答案。

    即使在编辑了注释“我删除了代码的缩进部分”之后,我仍然不确定这段代码:

    def db_log(**kwargs):
      global db_con
      try:
        db_con = psycopg2.connect(
           " dbname = 'name' user = 'user' password = 'pass' host = 'host' port = '5439'")
      except:
        print("I am unable to connect")
        print('Connection Task Complete')
        task_instance = kwargs['task_instance']
        task_instance.xcom_push(key="dwh_connection" , value = "dwh_connection")
        return (dwh_connection)
    

    应该是这样的:

    def db_log(**kwargs):
      global db_con
      try:
        db_con = psycopg2.connect(
           " dbname = 'name' user = 'user' password = 'pass' host = 'host' port = '5439'")
      except:
        print("I am unable to connect")
    
      print('Connection Task Complete')
      task_instance = kwargs['task_instance']
      task_instance.xcom_push(key="dwh_connection" , value = "dwh_connection")
      #return (dwh_connection)  # don't need a return here
    

    除此之外,你另一个问题的想法( Python - AttributeError: 'NoneType' object has no attribute 'execute' )使用 PostgresHook 我觉得很有趣。你可能想在另一个问题上继续这个想法。

        2
  •  4
  •   tobi6    8 年前

    这是来自python的基本错误消息。

    NameError: name 'task_instance' is not defined
    

    告诉你 task_instance 当你想使用它的时候,它就找不到了。

    任务实例是在已传递给函数的上下文中提供的。

    气流通过设置发送上下文

    provide_context=True,
    

    在任务中。该定义也接受夸尔格:

    def insert_data(**kwargs):
    

    这也是正确的。

    更正

    首先需要将任务实例从上下文中取出,如下所示:

    task_instance = kwargs['task_instance']
    

    然后 您可以使用任务实例 xcom_pull . 所以它应该是这样的(也要加上一些注释):

    def insert_data(**kwargs):
        task_instance = kwargs['task_instance']
        db_con_xcom = task_instance.xcom_pull(key="db_con", task_ids='db_log')
        #return (v1)  # wrong, why return here?
        #cur = db_con.cursor()  # wrong, db_con might not be available
        cur = db_con_xcom
        cur.execute("""insert into tbl_1 select id,bill_no,status from tbl_2 limit 2;""")