代码之家 › 专栏 › 技术社区 › stack0114106

python在分区列结果之间切换

pyspark apache-spark python-3.x python

stack0114106 · 技术社区 · 5 年前

我在Spark scala中使用以下代码来获得分区列。

scala> val part_cols= spark.sql(" describe extended work.quality_stat ").select("col_name").as[String].collect()
part_cols: Array[String] = Array(x_bar, p1, p5, p50, p90, p95, p99, x_id, y_id, # Partition Information, # col_name, x_id, y_id, "", # Detailed Table Information, Database, Table, Owner, Created Time, Last Access, Created By, Type, Provider, Table Properties, Location, Serde Library, InputFormat, OutputFormat, Storage Properties, Partition Provider)

scala> part_cols.takeWhile( x => x.length()!= 0 ).reverse.takeWhile( x => x != "# col_name" )
res20: Array[String] = Array(x_id, y_id)

我需要用Python获得类似的输出。我很难在Python中为数组操作复制相同的代码,以获得[y_id,x_id]。

下面是我试过的。

>>> part_cols=spark.sql(" describe extended work.quality_stat ").select("col_name").collect()

可以使用Python吗。

0 回复 | 直到 5 年前

werner 5 年前

part_cols 问题中有一系列 rows .因此,第一步是将其转换为字符串数组。

part_cols = spark.sql(...).select("col_name").collect()
part_cols = [row['col_name'] for row in part_cols]

现在,您感兴趣的数组部分的开始和结束可以用

start_index = part_cols.index("# col_name") + 1
end_index = part_cols.index('', start_index)

终于 slice 可以从列表中提取,使用这两个值作为开始和结束

part_cols[start_index:end_index]

这个切片将包含这些值

['x_id', 'y_id']

如果输出真的应该反转,切片

part_cols[end_index-1:start_index-1:-1]

将包含这些值

['y_id', 'x_id']

推荐文章

ÎÎÎ½Î· ÎÎ®Î¹Î½Î¿Ï · Python lxml.html语法错误:使用lxml find时XPATH的谓词无效

1 年前

Cam · Pandas列表日期到日期时间

1 年前

RASEL MAHMUD · 为什么以及如何在is_even()函数内的IF条件中递归X变量在满足0后递增?[副本]

1 年前

jjkennedy · Pandas文本文件导入:当每个文件中存在多个表时,自动选择1个表

1 年前

LMC · Numpy数组布尔索引以获取包含元素

1 年前

vr8ce · 非成对标记中特定字符的正则表达式

1 年前

Kernel · 如果指定了crs参数,shapefile的geopandas.read_file将出错

1 年前

ShaAnder · 为什么sqllachemy返回的是类而不是字符串

1 年前

sixtytrees · detectron2软件包未安装(没有名为“torch”的模块),但我安装了torch

1 年前

Pernoctador · Python映射可以复制吗?我需要参考地图

1 年前