df.printSchema
root
|-- userId: string (nullable = false)
|-- firstName: string (nullable = false)
|-- address: string (nullable = true)
|-- Email: array (nullable = true)
| |-- element: string (containsNull = true)
|-- UserFoodFavourites: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- foodName: string (nullable = true)
| | |-- isFavFood: boolean (nullable = false)
|-- UserGameFavourites: array (nullable = true)
| |-- element: array (containsNull = true)
| | |-- element: struct (containsNull = true)
| | | |-- Department: string (nullable = false)
| | | |-- gameName: string (nullable = false)
将dataframe写入JSON:
df.repartition(1).write.option("mode","append").json("s3Location")
我得到的JSON输出:
{"userId":111,"firstName":"first123","address":"xyz",
"Email":["def@gmail.com","abc@gmail.com"],
"UserFoodFavourites":[{"foodName":"food1","isFavFood":true},{"foodName":"food2","isFavFood":false}],
"UserGameFavourites":[[{"Department":"Outdoor","gameName":"O1"}],[{"Department":"Indoor","gameName":"I1"},{"Department":"Indoor","gameName":"I2"}]]}
{"userId":123,"firstName":"first123","address":"xyz",
"Email":["def@gmail.com","abc@gmail.com"],
"UserFoodFavourites":[{"foodName":"food1","isFavFood":true},{"foodName":"food2","isFavFood":false}],
"UserGameFavourites":[[{"Department":"Outdoor","gameName":"O1"}],[{"Department":"Indoor","gameName":"I1"},{"Department":"Indoor","gameName":"I2"}]]}
alias prettyjson='python -m json.tool'
但是,当我尝试在这个文件上使用prettyJSON以漂亮的JSON格式打印此文件时,这是不起作用的,因为这些文件是作为每个
userId
.
我试着把它写成一个带有新行分隔的JSON对象,这样我就可以在这个文件上运行prettyJSON了。