代码之家  ›  专栏  ›  技术社区  ›  Mukesh Gupta

如何在配置单元的分区数据中创建表?

  •  1
  • Mukesh Gupta  · 技术社区  · 7 年前
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:16 /kylin/retailer/qi_basket_brand_bucket_fact/_impala_insert_staging
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:18 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI
    [mgupta@sjc-dev-binn01 ~]$ hadoop fs -ls /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI
    Found 27 items
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:16 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201601
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:16 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201602
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:16 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201603
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:16 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201604
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:16 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201605
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:16 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201606
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:16 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201607
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:16 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201608
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:17 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201609
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:17 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201610
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:17 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201611
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:17 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201612
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:17 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201701
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:17 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201702
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:17 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201703
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:17 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201704
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:17 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201705
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:17 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201706
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:17 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201707
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:18 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201708
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:18 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201709
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:18 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201710
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:18 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201711
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:18 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201712
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:18 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201801
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:18 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201802
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:18 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201803
    [mgupta@sjc-dev-binn01 ~]$ hadoop fs -ls /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201601
    Found 3 items
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:16 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201601/company_sid=0
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:16 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201601/company_sid=38527
    drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:16 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201601/company_sid=__HIVE_DEFAULT_PARTITION__
    [mgupta@sjc-dev-binn01 ~]$ hadoop fs -ls /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201601/company_sid=0
    Found 1 items
    -rw-r--r--   3 mgupta supergroup    2069014 2018-03-26 22:16 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201601/company_sid=0/f9466a0068b906cf-6ace7f8500000049_294515768_data.0.parq
    [mgupta@sjc-dev-binn01 ~]$
    
    2 回复  |  直到 5 年前
        1
  •  8
  •   Satya Pavan    3 年前

    您可以尝试以下步骤。

    方法1

    1. 标识架构(列名和类型,包括分区列)

    2. 创建配置单元分区表(确保添加分区列和分隔符信息)

    3. 将数据加载到分区表中。(在这种情况下,加载文件将没有分区列,因为您将通过 load 命令)

       create table <table_name> (col1 data_type1, col2 data_type2..)
       partitioned by(part_col data_type3)
       row format delimited
       fields terminated by '<field_delimiter_in_your_data>'
      
       load data inpath '/hdfs/loc/file1' into table <table_name>
       partition (<part_col>='201601');
      
       load data inpath '/hdfs/loc/file2' into table <table_name>
       partition (<part_col>='201602')
      
       load data inpath '/hdfs/loc/file3' into table <table_name>
       partition (<part_col>='201603')
      

    等等


    方法2

    1. 创建一个临时表(临时表),其架构与主表相同,但没有任何分区

    2. 将整个数据加载到此表中(确保 分区列 '作为这些文件中的字段之一)

    3. 使用将数据从临时表加载到主表 动态分区插入

       create table <staging_table> (col1 data_type1, col2 data_type2..)
       row format delimited
       fields terminated by '<field_delimiter_in_your_data>'
      
       create table <main_table> (col1 data_type1, col2 data_type2..)
       partitioned by(part_col data_type3);
      
       load data inpath '/hdfs/loc/directory/' into table <staging_table>;
      
       SET hive.exec.dynamic.partition=true; 
       SET hive.exec.dynamic.partition.mode=nonstrict;
      
       insert into table <main_table> 
       partition(part_col) 
       select col1,col2,....part_col from <staging_table>;
      

    方法2的关键方面包括:

    • 制作' part\u列 '作为加载文件中的字段可用
    • 在最后的insert语句中,get' part\u列 '作为select子句的最后一个字段。
        2
  •  0
  •   user5099519 user5099519    4 年前

    让我们创建一个以年和月为分区的表,表中有一个时间戳:

    CREATE TABLE `mypart_p`(
       `id` bigint, 
       `open_ts` string 
    )
    PARTITIONED BY (YEAR INT, MONTH INT)
    

    现在我得把桌子改一下。

    ALTER TABLE mypart_p ADD PARTITION (YEAR=2020, MONTH=1)
    

    我必须每年每月都这样做,在python中循环完成。现在让我们用数据填充它,并指定该数据属于哪个分区:

    INSERT into mypart_p PARTITION (YEAR=2020, MONTH=1)
    
    select id,
    open_ts
    
    FROM some_other_table
    
    WHERE substring(open_ts,0,4) = '2020'
    AND substring(open_ts,6,2) = '01'