I'm trying to combine multiple columns into 1 column with python Dask. However I don't seem to find an (elegant) way to combine columns into a list.
I only need to combine column "b - e" into 1 column. Column "a" needs to stay exactly as it is now. I've tried using apply to achieve this. Despite the fact that this isn't working it's also slow and must likely not the best way to do this.
Polars has this command concat_list
which works beautifully using: df.select('a', value=pl.concat_list(pl.exclude('a')))
. What is the Dask alternative for this?
Anyone who can help me out with how I can achieve this result?
The dataframe I have:
┌─────┬─────┬─────┬─────┬─────┐
│ a ┆ b ┆ c ┆ d ┆ e │
╞═════╪═════╪═════╪═════╪═════╡
│ 0.1 ┆ 1.1 ┆ 2.1 ┆ 3.1 ┆ 4.1 │
│ 0.2 ┆ 1.2 ┆ 2.2 ┆ 3.2 ┆ 4.2 │
│ 0.3 ┆ 1.3 ┆ 2.3 ┆ 3.3 ┆ 4.3 │
└─────┴─────┴─────┴─────┴─────┘
The result I need:
┌─────┬───────────────────┐
│ a ┆ value │
╞═════╪═══════════════════╡
│ 0.1 ┆ [1.1, … 4.1] │
│ 0.2 ┆ [1.2, … 4.2] │
│ 0.3 ┆ [1.3, … 4.3] │
│ 0.4 ┆ [1.4, … 4.4] │
└─────┴───────────────────┘
Example code of the dataframe:
import dask.dataframe as dd
df = dd.DataFrame.from_dict({
"a": [0.1, 0.2, 0.3],
"b": [1.1, 1.2, 1.3],
"c": [2.1, 2.2, 2.3],
"d": [3.1, 3.2, 3.3],
"e": [4.1, 4.2, 4.3],
}, npartitions=1)
df = df.assign(test=[df["b"], df["c"], df["d"], df["e"]])
print('\nDask\n',df.compute())
source https://stackoverflow.com/questions/76086748/how-to-combine-columns-in-dask-horizontally
Comments
Post a Comment