Skip to main content

PySpark least or greatest with dynamic columns

I need to give a dynamic parameter to greatest in pyspark but it seems it doesn't accept that,

column_names=['c_'+str(x) for x in range(800)]
train_df=spark.read.parquet(train_path).select(column_names).withColumn('max_values',least(col_names))

So what is the way to do this without using RDDs and just dataframes



source https://stackoverflow.com/questions/74945288/pyspark-least-or-greatest-with-dynamic-columns

Comments