import pandas as pd
import numpy as np
import dask.array as dka
import dask.dataframe as dkf
vals = np.arange(200).reshape(20,10)
vals = dka.from_array(vals)
columns = [f"col_{i}" for i in range(10)]
ser = pd.Series(range(20), index=np.arange(20)+100, name='idx')
ddf = dkf.from_dask_array(vals, columns=columns)
dks = dkf.from_pandas(ser, npartitions=ddf.npartitions)
ddf.set_index(dks, sorted=True).compute()
-----------------
ValueError: Length mismatch: Expected 19 rows, received array of length 0
I'm confused about this error, dks has 20 elements, why does dask complain it has length 0? and why does it expect 19 rows when the ddf has 20 rows?
source https://stackoverflow.com/questions/71877213/length-mismatch-error-when-using-set-index-using-dask-series-on-dask-dataframe
Comments
Post a Comment