参考https://docs.databricks.com/spark/latest/spark-sql/udf-in-python.html
举例:
>>> a = [{'a': 'a', 'b': 1}, {'a': 'aa', 'b': 2}]
>>> df = spark.createDataFrame(a)
[Row(a=u'a', b=1), Row(a=u'aa', b=2)]
>>> def func(str):
... return len(str) > 1
...
>>> from pyspark.sql.functions import udf
>>> from pyspark.sql.types import BooleanType
>>> func_udf = udf(func, BooleanType())
>>> df2 = df.filter(func_udf(df['a']))
>>> df2.collect()
[Row(a=u'aa', b=2)]
没有评论:
发表评论