我在PySpark中有一个Spark数据帧,我正试图从中删除空值。
之前,在解析过程中清理东西时,我运行了
convert_to_null
方法论
title
基本上检查字符串是否按字面意思表示的列
"None"
如果是的话,把它转换成
None
. 这样,Spark将其转换为内部空类型。
现在,我正试图将空类型的行放到
标题
列。以下是我试图删除空值的所有内容:
new_df = df.na.drop('title')
new_df = df[F.col('title').isNotNull()]
new_df = df[~F.col('title').isNull()]
但我总是在
new_df.show()
在以下时间后拨打几行电话:
Py4JJavaError: An error occurred while calling o2022.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 87.0 failed 1 times, most recent failure: Lost task 1.0 in stage 87.0 (TID 314, localhost, executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/worker.py", line 230, in main
process()
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/worker.py", line 225, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 324, in dump_stream
self.serializer.dump_stream(self._batched(iterator), stream)
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 139, in dump_stream
for obj in iterator:
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 313, in _batched
for item in iterator:
File "<string>", line 1, in <lambda>
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/worker.py", line 75, in <lambda>
return lambda *a: f(*a)
File "/usr/local/spark/python/pyspark/util.py", line 55, in wrapper
return f(*args, **kwargs)
File "<ipython-input-16-48bc3ec1b5d9>", line 5, in replace_none_with_null
TypeError: 'in <string>' requires string as left operand, not NoneType
我想我快疯了。我不知道怎么修理东西。如有任何帮助,我们将不胜感激。谢谢!