I need to give a dynamic parameter to greatest in pyspark but it seems it doesn't accept that,
column_names=['c_'+str(x) for x in range(800)]
train_df=spark.read.parquet(train_path).select(column_names).withColumn('max_values',least(col_names))
So what is the way to do this without using RDDs and just dataframes
source https://stackoverflow.com/questions/74945288/pyspark-least-or-greatest-with-dynamic-columns
Comments
Post a Comment