代码之家  ›  专栏  ›  技术社区  ›  dreddy

Dataframe作为单个对象写入JSON

  •  0
  • dreddy  · 技术社区  · 7 年前

    df.printSchema
    root
     |-- userId: string (nullable = false)
     |-- firstName: string (nullable = false)
     |-- address: string (nullable = true)
     |-- Email: array (nullable = true)
     |    |-- element: string (containsNull = true)
     |-- UserFoodFavourites: array (nullable = true)
     |    |-- element: struct (containsNull = true)
     |    |    |-- foodName: string (nullable = true)
     |    |    |-- isFavFood: boolean (nullable = false)
     |-- UserGameFavourites: array (nullable = true)
     |    |-- element: array (containsNull = true)
     |    |    |-- element: struct (containsNull = true)
     |    |    |    |-- Department: string (nullable = false)
     |    |    |    |-- gameName: string (nullable = false)
    

    将dataframe写入JSON:

    df.repartition(1).write.option("mode","append").json("s3Location")
    

    我得到的JSON输出:

           {"userId":111,"firstName":"first123","address":"xyz",
             "Email":["def@gmail.com","abc@gmail.com"],
             "UserFoodFavourites":[{"foodName":"food1","isFavFood":true},{"foodName":"food2","isFavFood":false}],
             "UserGameFavourites":[[{"Department":"Outdoor","gameName":"O1"}],[{"Department":"Indoor","gameName":"I1"},{"Department":"Indoor","gameName":"I2"}]]}
           {"userId":123,"firstName":"first123","address":"xyz",
             "Email":["def@gmail.com","abc@gmail.com"],
             "UserFoodFavourites":[{"foodName":"food1","isFavFood":true},{"foodName":"food2","isFavFood":false}],
             "UserGameFavourites":[[{"Department":"Outdoor","gameName":"O1"}],[{"Department":"Indoor","gameName":"I1"},{"Department":"Indoor","gameName":"I2"}]]}
    
    alias prettyjson='python -m json.tool'
    

    但是,当我尝试在这个文件上使用prettyJSON以漂亮的JSON格式打印此文件时,这是不起作用的,因为这些文件是作为每个 userId .

    我试着把它写成一个带有新行分隔的JSON对象,这样我就可以在这个文件上运行prettyJSON了。

    0 回复  |  直到 7 年前
    推荐文章