How to combine columns in dask horizontally?

I'm trying to combine multiple columns into 1 column with python Dask. However I don't seem to find an (elegant) way to combine columns into a list.

I only need to combine column "b - e" into 1 column. Column "a" needs to stay exactly as it is now. I've tried using apply to achieve this. Despite the fact that this isn't working it's also slow and must likely not the best way to do this.

Polars has this command concat_list which works beautifully using: df.select('a', value=pl.concat_list(pl.exclude('a'))). What is the Dask alternative for this?

Anyone who can help me out with how I can achieve this result?

The dataframe I have:

┌─────┬─────┬─────┬─────┬─────┐
│ a   ┆ b   ┆ c   ┆ d   ┆ e   │
╞═════╪═════╪═════╪═════╪═════╡
│ 0.1 ┆ 1.1 ┆ 2.1 ┆ 3.1 ┆ 4.1 │
│ 0.2 ┆ 1.2 ┆ 2.2 ┆ 3.2 ┆ 4.2 │
│ 0.3 ┆ 1.3 ┆ 2.3 ┆ 3.3 ┆ 4.3 │
└─────┴─────┴─────┴─────┴─────┘

The result I need:

┌─────┬───────────────────┐
│ a   ┆ value             │
╞═════╪═══════════════════╡
│ 0.1 ┆ [1.1, … 4.1]      │
│ 0.2 ┆ [1.2, … 4.2]      │
│ 0.3 ┆ [1.3, … 4.3]      │
│ 0.4 ┆ [1.4, … 4.4]      │
└─────┴───────────────────┘

Example code of the dataframe:

import dask.dataframe as dd


df = dd.DataFrame.from_dict({
    "a": [0.1, 0.2, 0.3],
    "b": [1.1, 1.2, 1.3],
    "c": [2.1, 2.2, 2.3],
    "d": [3.1, 3.2, 3.3],
    "e": [4.1, 4.2, 4.3],
}, npartitions=1)

df = df.assign(test=[df["b"], df["c"], df["d"], df["e"]])

print('\nDask\n',df.compute())

source https://stackoverflow.com/questions/76086748/how-to-combine-columns-in-dask-horizontally

StacksPedia

Search This Blog

How to combine columns in dask horizontally?

Labels

Comments

Post a Comment

Popular posts from this blog

Confusion between commands.Bot and discord.Client | Which one should I use?

How to show number of registered users in Laravel based on usertype?

Why is my reports service not connecting?