Skip to main content

Pandas - concatenating selected rows from several groups with additional condition

I have an example

+─────────────+────────+────────+────────+
| main_group  | COL_A  | COL_B  | COL_C  |
+─────────────+────────+────────+────────+
| 0           | TXT1   |        | None   |
| 0           | TXT2   |        | None   |
| 0           | 5      |        | None   |
| 0           | 1.93   | 1.93   | 0      |
| 0           | 7.60   | 7.60   | 1      |
| 0           | 2.46   | 2.46   | 1      |
| 1           | TXT11  |        | None   |
| 1           | TXT12  |        | None   |
| 1           | 0.50   |        | None   |
| 1           | 0.45   | 0.45   | 0      |
| 1           | 0.31   | 0.31   | 1      |
| 1           | 0.35   | 0.35   | 1      |
| 1           | 0.73   | 0.73   | 1      |
| 2           | 0.5    |        | None   |
| 2           | 4.15   | 4.15   | 0      |
| 2           | 2.98   | 2.98   | 0      |
| 2           | 1.53   | 1.53   | 0      |
| 3           | 4.46   |        | None   |
| 3           | 4.00   | 4.00   | 0      |
| 3           | 0.95   | 0.95   | 1      |
| 3           | 1.35   | 1.35   | 1      |
| 3           | 1.79   | 1.79   | 1      |
+─────────────+────────+────────+────────+

I would like to move the value from COL_A for the last occurrence of a row with None in column COL_C per main_group. This value should be moved to the first element in group for COL_B column and then the previous contents should be removed.

Here is how it should look like for main_group == 0:

+─────────────+────────+────────+────────+
| main_group  | COL_A  | COL_B  | COL_C  |
+─────────────+────────+────────+────────+
| 0           | TXT1   | 5      | None   | <--- value "5" from the last row with "None" in `COL_C` in `main_group` == 0 was moved to the first row in the same group
| 0           | TXT2   |        | None   |
| 0           | 5      |        | None   | <--- After that this row should be deleted
| 0           | 1.93   | 1.93   | 0      |
| 0           | 7.60   | 7.60   | 1      |
| 0           | 2.46   | 2.46   | 1      |
+─────────────+────────+────────+────────+
+─────────────+────────+────────+────────+
| main_group  | COL_A  | COL_B  | COL_C  |
+─────────────+────────+────────+────────+
| 2           | 0.5    | 0.5    | None   | <---  value in column `COL_B` should be same as in column `COL_A` because there are no other rows in the same `main_group` with "None" in column `COL_C`
| 2           | 4.15   | 4.15   | 0      |
| 2           | 2.98   | 2.98   | 0      |
| 2           | 1.53   | 1.53   | 0      |
| 3           | 4.46   | 4.46   | None   | <---  value in column `COL_B` should be same as in column `COL_A` because there are no other rows in the same `main_group` with "None" in column `COL_C`
| 3           | 4.00   | 4.00   | 0      |
| 3           | 0.95   | 0.95   | 1      |
| 3           | 1.35   | 1.35   | 1      |
| 3           | 1.79   | 1.79   | 1      |
+─────────────+────────+────────+────────+

After this operation, df should looks like:

+─────────────+────────+────────+────────+
| main_group  | COL_A  | COL_B  | COL_C  |
+─────────────+────────+────────+────────+
| 0           | TXT1   | 5      | None   |
| 0           | TXT2   |        | None   |
| 0           | 1.93   | 1.93   | 0      |
| 0           | 7.60   | 7.60   | 1      |
| 0           | 2.46   | 2.46   | 1      |
| 1           | TXT11  | 0.50   | None   |
| 1           | TXT12  |        | None   |
| 1           | 0.45   | 0.45   | 0      |
| 1           | 0.31   | 0.31   | 1      |
| 1           | 0.35   | 0.35   | 1      |
| 1           | 0.73   | 0.73   | 1      |
| 2           | 0.5    | 0.5    | None   |
| 2           | 4.15   | 4.15   | 0      |
| 2           | 2.98   | 2.98   | 0      |
| 2           | 1.53   | 1.53   | 0      |
| 3           | 4.46   | 4.46   | None   |
| 3           | 4.00   | 4.00   | 0      |
| 3           | 0.95   | 0.95   | 1      |
| 3           | 1.35   | 1.35   | 1      |
| 3           | 1.79   | 1.79   | 1      |
+─────────────+────────+────────+────────+

In the last step, I would like to concatenating selected COL_A column in each main_group where value in COL_C is None.

Example:

+─────────────+────────+────────+────────+
| main_group  | COL_A  | COL_B  | COL_C  |
+─────────────+────────+────────+────────+
| 0           | TXT1   | 5      | None   |
| 0           | TXT2   |        | None   |
| 0           | 1.93   | 1.93   | 0      |
| 0           | 7.60   | 7.60   | 1      |
| 0           | 2.46   | 2.46   | 1      |

↓↓↓↓↓↓↓↓↓↓

+─────────────+────────────+────────+────────+
| main_group  | COL_A      | COL_B  | COL_C  |
+─────────────+────────────+────────+────────+
| 0           | TXT1 TXT2  | 5      | None   | <--- If there are more than 1 row with "None" in column `COL_C` in each group, then values in column `COL_A` should be "merged" into one row, and all others should be deleted
| 0           | 1.93       | 1.93   | 0      |
| 0           | 7.60       | 7.60   | 1      |
| 0           | 2.46       | 2.46   | 1      |

The first solution I have is to .loc those rows in the group that have a value "None" in COL_C column and then assign to the first element (.iloc) value from the last row. However, this solution is not quite correct, and I'm additionally convinced that this can be done using .groupby instead of iterating and searching for elements after each group but I can't do it correctly.

The solution I was able to get this way:

+─────────────+────────────+────────+────────+
| main_group  | COL_A      | COL_B  | COL_C  |
+─────────────+────────────+────────+────────+
| 0           | TXT1 TXT2  | 5      | None   |
| NaN         | NaN        | NaN    | NaN    |
| NaN         | NaN        | NaN    | NaN    |
| 0           | 1.93       | 1.93   | 0      |
| 0           | 7.60       | 7.60   | 1      |
| 0           | 2.46       | 2.46   | 1      |

This is partially transferred correctly, but there are still NaN values in the rows that should not be there anymore. Of course I can delete those rows and re-index df but this solution relies on loops, which will certainly not be efficient for large df.

How can I do it with a workaround of these loops over individual groups and swapping values using .loc and .iloc?



source https://stackoverflow.com/questions/72368280/pandas-concatenating-selected-rows-from-several-groups-with-additional-conditi

Comments

Popular posts from this blog

Prop `className` did not match in next js app

I have written a sample code ( Github Link here ). this is a simple next js app, but giving me error when I refresh the page. This seems to be the common problem and I tried the fix provided in the internet but does not seem to fix my issue. The error is Warning: Prop className did not match. Server: "MuiBox-root MuiBox-root-1" Client: "MuiBox-root MuiBox-root-2". Did changes for _document.js, modified _app.js as mentioned in official website and solutions in stackoverflow. but nothing seems to work. Could someone take a look and help me whats wrong with the code? Via Active questions tagged javascript - Stack Overflow https://ift.tt/2FdjaAW

How to show number of registered users in Laravel based on usertype?

i'm trying to display data from the database in the admin dashboard i used this: <?php use Illuminate\Support\Facades\DB; $users = DB::table('users')->count(); echo $users; ?> and i have successfully get the correct data from the database but what if i want to display a specific data for example in this user table there is "usertype" that specify if the user is normal user or admin i want to user the same code above but to display a specific usertype i tried this: <?php use Illuminate\Support\Facades\DB; $users = DB::table('users')->count()->WHERE usertype =admin; echo $users; ?> but it didn't work, what am i doing wrong? source https://stackoverflow.com/questions/68199726/how-to-show-number-of-registered-users-in-laravel-based-on-usertype

Why is my reports service not connecting?

I am trying to pull some data from a Postgres database using Node.js and node-postures but I can't figure out why my service isn't connecting. my routes/index.js file: const express = require('express'); const router = express.Router(); const ordersCountController = require('../controllers/ordersCountController'); const ordersController = require('../controllers/ordersController'); const weeklyReportsController = require('../controllers/weeklyReportsController'); router.get('/orders_count', ordersCountController); router.get('/orders', ordersController); router.get('/weekly_reports', weeklyReportsController); module.exports = router; My controllers/weeklyReportsController.js file: const weeklyReportsService = require('../services/weeklyReportsService'); const weeklyReportsController = async (req, res) => { try { const data = await weeklyReportsService; res.json({data}) console