Skip to main content

Pandas - concatenating selected rows from several groups with additional condition

I have an example

+─────────────+────────+────────+────────+
| main_group  | COL_A  | COL_B  | COL_C  |
+─────────────+────────+────────+────────+
| 0           | TXT1   |        | None   |
| 0           | TXT2   |        | None   |
| 0           | 5      |        | None   |
| 0           | 1.93   | 1.93   | 0      |
| 0           | 7.60   | 7.60   | 1      |
| 0           | 2.46   | 2.46   | 1      |
| 1           | TXT11  |        | None   |
| 1           | TXT12  |        | None   |
| 1           | 0.50   |        | None   |
| 1           | 0.45   | 0.45   | 0      |
| 1           | 0.31   | 0.31   | 1      |
| 1           | 0.35   | 0.35   | 1      |
| 1           | 0.73   | 0.73   | 1      |
| 2           | 0.5    |        | None   |
| 2           | 4.15   | 4.15   | 0      |
| 2           | 2.98   | 2.98   | 0      |
| 2           | 1.53   | 1.53   | 0      |
| 3           | 4.46   |        | None   |
| 3           | 4.00   | 4.00   | 0      |
| 3           | 0.95   | 0.95   | 1      |
| 3           | 1.35   | 1.35   | 1      |
| 3           | 1.79   | 1.79   | 1      |
+─────────────+────────+────────+────────+

I would like to move the value from COL_A for the last occurrence of a row with None in column COL_C per main_group. This value should be moved to the first element in group for COL_B column and then the previous contents should be removed.

Here is how it should look like for main_group == 0:

+─────────────+────────+────────+────────+
| main_group  | COL_A  | COL_B  | COL_C  |
+─────────────+────────+────────+────────+
| 0           | TXT1   | 5      | None   | <--- value "5" from the last row with "None" in `COL_C` in `main_group` == 0 was moved to the first row in the same group
| 0           | TXT2   |        | None   |
| 0           | 5      |        | None   | <--- After that this row should be deleted
| 0           | 1.93   | 1.93   | 0      |
| 0           | 7.60   | 7.60   | 1      |
| 0           | 2.46   | 2.46   | 1      |
+─────────────+────────+────────+────────+
+─────────────+────────+────────+────────+
| main_group  | COL_A  | COL_B  | COL_C  |
+─────────────+────────+────────+────────+
| 2           | 0.5    | 0.5    | None   | <---  value in column `COL_B` should be same as in column `COL_A` because there are no other rows in the same `main_group` with "None" in column `COL_C`
| 2           | 4.15   | 4.15   | 0      |
| 2           | 2.98   | 2.98   | 0      |
| 2           | 1.53   | 1.53   | 0      |
| 3           | 4.46   | 4.46   | None   | <---  value in column `COL_B` should be same as in column `COL_A` because there are no other rows in the same `main_group` with "None" in column `COL_C`
| 3           | 4.00   | 4.00   | 0      |
| 3           | 0.95   | 0.95   | 1      |
| 3           | 1.35   | 1.35   | 1      |
| 3           | 1.79   | 1.79   | 1      |
+─────────────+────────+────────+────────+

After this operation, df should looks like:

+─────────────+────────+────────+────────+
| main_group  | COL_A  | COL_B  | COL_C  |
+─────────────+────────+────────+────────+
| 0           | TXT1   | 5      | None   |
| 0           | TXT2   |        | None   |
| 0           | 1.93   | 1.93   | 0      |
| 0           | 7.60   | 7.60   | 1      |
| 0           | 2.46   | 2.46   | 1      |
| 1           | TXT11  | 0.50   | None   |
| 1           | TXT12  |        | None   |
| 1           | 0.45   | 0.45   | 0      |
| 1           | 0.31   | 0.31   | 1      |
| 1           | 0.35   | 0.35   | 1      |
| 1           | 0.73   | 0.73   | 1      |
| 2           | 0.5    | 0.5    | None   |
| 2           | 4.15   | 4.15   | 0      |
| 2           | 2.98   | 2.98   | 0      |
| 2           | 1.53   | 1.53   | 0      |
| 3           | 4.46   | 4.46   | None   |
| 3           | 4.00   | 4.00   | 0      |
| 3           | 0.95   | 0.95   | 1      |
| 3           | 1.35   | 1.35   | 1      |
| 3           | 1.79   | 1.79   | 1      |
+─────────────+────────+────────+────────+

In the last step, I would like to concatenating selected COL_A column in each main_group where value in COL_C is None.

Example:

+─────────────+────────+────────+────────+
| main_group  | COL_A  | COL_B  | COL_C  |
+─────────────+────────+────────+────────+
| 0           | TXT1   | 5      | None   |
| 0           | TXT2   |        | None   |
| 0           | 1.93   | 1.93   | 0      |
| 0           | 7.60   | 7.60   | 1      |
| 0           | 2.46   | 2.46   | 1      |

↓↓↓↓↓↓↓↓↓↓

+─────────────+────────────+────────+────────+
| main_group  | COL_A      | COL_B  | COL_C  |
+─────────────+────────────+────────+────────+
| 0           | TXT1 TXT2  | 5      | None   | <--- If there are more than 1 row with "None" in column `COL_C` in each group, then values in column `COL_A` should be "merged" into one row, and all others should be deleted
| 0           | 1.93       | 1.93   | 0      |
| 0           | 7.60       | 7.60   | 1      |
| 0           | 2.46       | 2.46   | 1      |

The first solution I have is to .loc those rows in the group that have a value "None" in COL_C column and then assign to the first element (.iloc) value from the last row. However, this solution is not quite correct, and I'm additionally convinced that this can be done using .groupby instead of iterating and searching for elements after each group but I can't do it correctly.

The solution I was able to get this way:

+─────────────+────────────+────────+────────+
| main_group  | COL_A      | COL_B  | COL_C  |
+─────────────+────────────+────────+────────+
| 0           | TXT1 TXT2  | 5      | None   |
| NaN         | NaN        | NaN    | NaN    |
| NaN         | NaN        | NaN    | NaN    |
| 0           | 1.93       | 1.93   | 0      |
| 0           | 7.60       | 7.60   | 1      |
| 0           | 2.46       | 2.46   | 1      |

This is partially transferred correctly, but there are still NaN values in the rows that should not be there anymore. Of course I can delete those rows and re-index df but this solution relies on loops, which will certainly not be efficient for large df.

How can I do it with a workaround of these loops over individual groups and swapping values using .loc and .iloc?



source https://stackoverflow.com/questions/72368280/pandas-concatenating-selected-rows-from-several-groups-with-additional-conditi

Comments

Popular posts from this blog

How to show number of registered users in Laravel based on usertype?

i'm trying to display data from the database in the admin dashboard i used this: <?php use Illuminate\Support\Facades\DB; $users = DB::table('users')->count(); echo $users; ?> and i have successfully get the correct data from the database but what if i want to display a specific data for example in this user table there is "usertype" that specify if the user is normal user or admin i want to user the same code above but to display a specific usertype i tried this: <?php use Illuminate\Support\Facades\DB; $users = DB::table('users')->count()->WHERE usertype =admin; echo $users; ?> but it didn't work, what am i doing wrong? source https://stackoverflow.com/questions/68199726/how-to-show-number-of-registered-users-in-laravel-based-on-usertype

Why is my reports service not connecting?

I am trying to pull some data from a Postgres database using Node.js and node-postures but I can't figure out why my service isn't connecting. my routes/index.js file: const express = require('express'); const router = express.Router(); const ordersCountController = require('../controllers/ordersCountController'); const ordersController = require('../controllers/ordersController'); const weeklyReportsController = require('../controllers/weeklyReportsController'); router.get('/orders_count', ordersCountController); router.get('/orders', ordersController); router.get('/weekly_reports', weeklyReportsController); module.exports = router; My controllers/weeklyReportsController.js file: const weeklyReportsService = require('../services/weeklyReportsService'); const weeklyReportsController = async (req, res) => { try { const data = await weeklyReportsService; res.json({data}) console...

Confusion between commands.Bot and discord.Client | Which one should I use?

Whenever you look at YouTube tutorials or code from this website there is a real variation. Some developers use client = discord.Client(intents=intents) while the others use bot = commands.Bot(command_prefix="something", intents=intents) . Now I know slightly about the difference but I get errors from different places from my code when I use either of them and its confusing. Especially since there has a few changes over the years in discord.py it is hard to find the real difference. I tried sticking to discord.Client then I found that there are more features in commands.Bot . Then I found errors when using commands.Bot . An example of this is: When I try to use commands.Bot client = commands.Bot(command_prefix=">",intents=intents) async def load(): for filename in os.listdir("./Cogs"): if filename.endswith(".py"): client.load_extension(f"Cogs.{filename[:-3]}") The above doesnt giveany response from my Cogs ...