I am trying to add a column to my PySpark DataFrame of a list of integers. It was originally a date time object that I extracted the date from:
date = []
for i in range(count):
if i+1 == count:
date.append((datetime.date.today() - dates[i][0]).days)
elif dates[i][0] == dates[i+1][0] and product[i][0] == product[i+1][0]:
date.append((dates[i+1][0] - dates[i][0]).days)
else:
date.append((datetime.date.today() - dates[i][0]).days)
I then tried to add date as as a new column like this:
df = df.withColumn("Date Difference", date)
and get this error: AssertionError: col should be Column
How do I add this column of days without getting the error? Thank you!
source https://stackoverflow.com/questions/69352170/add-column-from-list-of-integers-pyspark
Comments
Post a Comment