根据我对您的模式和数据的理解,您的架构是一个深度嵌套的结构,因此您可以
explode
items.application.component
,然后
select
你的
name
version
从那列。
https://docs.databricks.com/spark/latest/dataframes-datasets/complex-nested-data.html
from pyspark.sql import functions as F
df.withColumn("items", F.explode(F.col("items.application.component")))\
.select("accountId","resourceId","items.name","items.version").show()
+---------+----------+----+-------+
|accountId|resourceId|name|version|
+---------+----------+----+-------+
| 1| r1|tool| 1.0|
| 1| r1| app| 1.0|
| 1| r2|tool| 2.0|
| 1| r2| app| 2.0|
| 2| r3|tool| 3.0|
| 2| r3| app| 3.0|
+---------+----------+----+-------+