Pandas UDF Error: AttributeError: 'NoneType' object has no attribute '

Pandas UDF Error: AttributeError: 'NoneType' object has no attribute '_jvm' [duplicate]

I have a pyspark dataframe like this:

+-------+-------+
| level | value |
+-------+-------+
|  1    |   4   |
|  1    |   5   |
|  2    |   2   |
|  2    |   6   |
|  2    |   3   |
+-------+-------+

I have to create a value for every group in level column and save this in lable column. This value for every group must be unique, so I use ObjectId Mongo function to create that. Next dataframe is like this:

+-------+--------+-------+
| level |   lable| value |
+-------+--------+-------+
|  1    |   bb76 |   4   |
|  1    |   bb76 |   5   |
|  2    |   cv86 |   2   |
|  2    |   cv86 |   6   |
|  2    |   cv86 |   3   |
+-------+--------+-------+

Then I must create a dataframe as following:

+-------+-------+
| lable | value |
+-------+-------+
|  bb76 |   9   |
|  cv86 |   11  |
+-------+-------+

To do that, first I used spark groupby:

   def create_objectid():
       a = str(ObjectId())
       return a

   def add_lable(df):
       df = df.cache()
       df.count()
       grouped_df = df.groupby('level').agg(sum(df.value).alias('temp'))
       grouped_df = grouped_df.withColumnRenamed('level', 'level_temp')
       grouped_df = grouped_df.withColumn('lable', udf_create_objectid())
       grouped_df = grouped_df.drop('temp')
       df  = df.join(grouped_df.select('level_temp','lable'), col('level') == col('level_temp'), how="left").drop(grouped_df.level_temp)
       return df

When I used the above code on spark dataframe with 2 millions records, it takes about 155 seconds to finish. I searched and found that spark window has better performance. Then, I changed the last function to this one. Because pandas_udf needs arg, so I just pass one and print it:

@f.pandas_udf("string")
def create_objectid_on_window(v: pd.Series) -> str:
    print('v:',v)
    return str(ObjectId())

def add_lable(df):  
    w = Window.partitionBy('level')
    df = df.withColumn('lable', create_objectid_on_window('level').over(w))
    return df

But after running the program, I receive this error:

AttributeError: 'NoneType' object has no attribute '_jvm'

Would you please guide me how to solve the problem?

Any help is really appreciated.

source https://stackoverflow.com/questions/76300991/pandas-udf-error-attributeerror-nonetype-object-has-no-attribute-jvm

StacksPedia

Search This Blog

Pandas UDF Error: AttributeError: 'NoneType' object has no attribute '_jvm' [duplicate]

Labels

Comments

Post a Comment

Popular posts from this blog

Where and how is this Laravel kernel constructor called? [closed]

Why is my reports service not connecting?

Confusion between commands.Bot and discord.Client | Which one should I use?