代码之家  ›  专栏  ›  技术社区  ›  M80

通过为包含JSON的列定义模式来创建配置单元表的视图

  •  0
  • M80  · 技术社区  · 8 年前
    1. 我已经在配置单元上为HDFS文件夹创建了一个外部表

    卡夫卡流到HDFS

    public static void main(String[] args) throws Exception {
    
        String brokers = "quickstart:9092";
        String topics = "simple_topic_6";
        String master = "local[*]";
    
        SparkSession sparkSession = SparkSession
                .builder().appName(EventKafkaToParquet.class.getName())
                .master(master).getOrCreate();
        SQLContext sqlContext = sparkSession.sqlContext();
        SparkContext context = sparkSession.sparkContext();
        context.setLogLevel("ERROR");
    
        Dataset<Row> rawDataSet = sparkSession.readStream()
                .format("kafka")
                .option("kafka.bootstrap.servers", brokers)
                .option("subscribe", topics).load();
        rawDataSet.printSchema();
    
        rawDataSet = rawDataSet.withColumn("employee", rawDataSet.col("value").cast(DataTypes.StringType));
        rawDataSet.createOrReplaceTempView("basicView");
        Dataset<Row> writeDataset = sqlContext.sql("select employee from basicView");
        writeDataset
                .repartition(1)
                .writeStream()
                .option("path","/user/cloudera/employee/")
                .option("checkpointLocation", "/user/cloudera/employee.checkpoint/")
                .format("parquet")
                .trigger(Trigger.ProcessingTime(5000))
                .start()
                .awaitTermination();
    }
    

    配置单元上的外部表

    CREATE EXTERNAL TABLE employee_raw ( employee STRING )  
    STORED AS PARQUET
    LOCATION '/user/cloudera/employee' ;
    

    firstName, lastName, street, city, state, zip
    

    hive> select * from employee_raw;
    OK
    {"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
    {"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
    {"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
    {"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
    {"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
    Time taken: 0.123 seconds, Fetched: 5 row(s)
    

    感谢您的意见

    1 回复  |  直到 8 年前
        1
  •  1
  •   U880D    8 年前

    根据你的描述,在我看来,你主要喜欢“ Extract values from JSON string in Hive “,因此您可以在 linked thread