Skip to main content

Pandas - concatenating selected rows from several groups with additional condition

I have an example

+─────────────+────────+────────+────────+
| main_group  | COL_A  | COL_B  | COL_C  |
+─────────────+────────+────────+────────+
| 0           | TXT1   |        | None   |
| 0           | TXT2   |        | None   |
| 0           | 5      |        | None   |
| 0           | 1.93   | 1.93   | 0      |
| 0           | 7.60   | 7.60   | 1      |
| 0           | 2.46   | 2.46   | 1      |
| 1           | TXT11  |        | None   |
| 1           | TXT12  |        | None   |
| 1           | 0.50   |        | None   |
| 1           | 0.45   | 0.45   | 0      |
| 1           | 0.31   | 0.31   | 1      |
| 1           | 0.35   | 0.35   | 1      |
| 1           | 0.73   | 0.73   | 1      |
| 2           | 0.5    |        | None   |
| 2           | 4.15   | 4.15   | 0      |
| 2           | 2.98   | 2.98   | 0      |
| 2           | 1.53   | 1.53   | 0      |
| 3           | 4.46   |        | None   |
| 3           | 4.00   | 4.00   | 0      |
| 3           | 0.95   | 0.95   | 1      |
| 3           | 1.35   | 1.35   | 1      |
| 3           | 1.79   | 1.79   | 1      |
+─────────────+────────+────────+────────+

I would like to move the value from COL_A for the last occurrence of a row with None in column COL_C per main_group. This value should be moved to the first element in group for COL_B column and then the previous contents should be removed.

Here is how it should look like for main_group == 0:

+─────────────+────────+────────+────────+
| main_group  | COL_A  | COL_B  | COL_C  |
+─────────────+────────+────────+────────+
| 0           | TXT1   | 5      | None   | <--- value "5" from the last row with "None" in `COL_C` in `main_group` == 0 was moved to the first row in the same group
| 0           | TXT2   |        | None   |
| 0           | 5      |        | None   | <--- After that this row should be deleted
| 0           | 1.93   | 1.93   | 0      |
| 0           | 7.60   | 7.60   | 1      |
| 0           | 2.46   | 2.46   | 1      |
+─────────────+────────+────────+────────+
+─────────────+────────+────────+────────+
| main_group  | COL_A  | COL_B  | COL_C  |
+─────────────+────────+────────+────────+
| 2           | 0.5    | 0.5    | None   | <---  value in column `COL_B` should be same as in column `COL_A` because there are no other rows in the same `main_group` with "None" in column `COL_C`
| 2           | 4.15   | 4.15   | 0      |
| 2           | 2.98   | 2.98   | 0      |
| 2           | 1.53   | 1.53   | 0      |
| 3           | 4.46   | 4.46   | None   | <---  value in column `COL_B` should be same as in column `COL_A` because there are no other rows in the same `main_group` with "None" in column `COL_C`
| 3           | 4.00   | 4.00   | 0      |
| 3           | 0.95   | 0.95   | 1      |
| 3           | 1.35   | 1.35   | 1      |
| 3           | 1.79   | 1.79   | 1      |
+─────────────+────────+────────+────────+

After this operation, df should looks like:

+─────────────+────────+────────+────────+
| main_group  | COL_A  | COL_B  | COL_C  |
+─────────────+────────+────────+────────+
| 0           | TXT1   | 5      | None   |
| 0           | TXT2   |        | None   |
| 0           | 1.93   | 1.93   | 0      |
| 0           | 7.60   | 7.60   | 1      |
| 0           | 2.46   | 2.46   | 1      |
| 1           | TXT11  | 0.50   | None   |
| 1           | TXT12  |        | None   |
| 1           | 0.45   | 0.45   | 0      |
| 1           | 0.31   | 0.31   | 1      |
| 1           | 0.35   | 0.35   | 1      |
| 1           | 0.73   | 0.73   | 1      |
| 2           | 0.5    | 0.5    | None   |
| 2           | 4.15   | 4.15   | 0      |
| 2           | 2.98   | 2.98   | 0      |
| 2           | 1.53   | 1.53   | 0      |
| 3           | 4.46   | 4.46   | None   |
| 3           | 4.00   | 4.00   | 0      |
| 3           | 0.95   | 0.95   | 1      |
| 3           | 1.35   | 1.35   | 1      |
| 3           | 1.79   | 1.79   | 1      |
+─────────────+────────+────────+────────+

In the last step, I would like to concatenating selected COL_A column in each main_group where value in COL_C is None.

Example:

+─────────────+────────+────────+────────+
| main_group  | COL_A  | COL_B  | COL_C  |
+─────────────+────────+────────+────────+
| 0           | TXT1   | 5      | None   |
| 0           | TXT2   |        | None   |
| 0           | 1.93   | 1.93   | 0      |
| 0           | 7.60   | 7.60   | 1      |
| 0           | 2.46   | 2.46   | 1      |

↓↓↓↓↓↓↓↓↓↓

+─────────────+────────────+────────+────────+
| main_group  | COL_A      | COL_B  | COL_C  |
+─────────────+────────────+────────+────────+
| 0           | TXT1 TXT2  | 5      | None   | <--- If there are more than 1 row with "None" in column `COL_C` in each group, then values in column `COL_A` should be "merged" into one row, and all others should be deleted
| 0           | 1.93       | 1.93   | 0      |
| 0           | 7.60       | 7.60   | 1      |
| 0           | 2.46       | 2.46   | 1      |

The first solution I have is to .loc those rows in the group that have a value "None" in COL_C column and then assign to the first element (.iloc) value from the last row. However, this solution is not quite correct, and I'm additionally convinced that this can be done using .groupby instead of iterating and searching for elements after each group but I can't do it correctly.

The solution I was able to get this way:

+─────────────+────────────+────────+────────+
| main_group  | COL_A      | COL_B  | COL_C  |
+─────────────+────────────+────────+────────+
| 0           | TXT1 TXT2  | 5      | None   |
| NaN         | NaN        | NaN    | NaN    |
| NaN         | NaN        | NaN    | NaN    |
| 0           | 1.93       | 1.93   | 0      |
| 0           | 7.60       | 7.60   | 1      |
| 0           | 2.46       | 2.46   | 1      |

This is partially transferred correctly, but there are still NaN values in the rows that should not be there anymore. Of course I can delete those rows and re-index df but this solution relies on loops, which will certainly not be efficient for large df.

How can I do it with a workaround of these loops over individual groups and swapping values using .loc and .iloc?



source https://stackoverflow.com/questions/72368280/pandas-concatenating-selected-rows-from-several-groups-with-additional-conditi

Comments

Popular posts from this blog

How to show number of registered users in Laravel based on usertype?

i'm trying to display data from the database in the admin dashboard i used this: <?php use Illuminate\Support\Facades\DB; $users = DB::table('users')->count(); echo $users; ?> and i have successfully get the correct data from the database but what if i want to display a specific data for example in this user table there is "usertype" that specify if the user is normal user or admin i want to user the same code above but to display a specific usertype i tried this: <?php use Illuminate\Support\Facades\DB; $users = DB::table('users')->count()->WHERE usertype =admin; echo $users; ?> but it didn't work, what am i doing wrong? source https://stackoverflow.com/questions/68199726/how-to-show-number-of-registered-users-in-laravel-based-on-usertype

Why is my reports service not connecting?

I am trying to pull some data from a Postgres database using Node.js and node-postures but I can't figure out why my service isn't connecting. my routes/index.js file: const express = require('express'); const router = express.Router(); const ordersCountController = require('../controllers/ordersCountController'); const ordersController = require('../controllers/ordersController'); const weeklyReportsController = require('../controllers/weeklyReportsController'); router.get('/orders_count', ordersCountController); router.get('/orders', ordersController); router.get('/weekly_reports', weeklyReportsController); module.exports = router; My controllers/weeklyReportsController.js file: const weeklyReportsService = require('../services/weeklyReportsService'); const weeklyReportsController = async (req, res) => { try { const data = await weeklyReportsService; res.json({data}) console...

How to split a rinex file if I need 24 hours data

Trying to divide rinex file using the command gfzrnx but getting this error. While doing that getting this error msg 'gfzrnx' is not recognized as an internal or external command Trying to split rinex file using the command gfzrnx. also install'gfzrnx'. my doubt is I need to run this program in 'gfzrnx' or in 'cmdprompt'. I am expecting a rinex file with 24 hrs or 1 day data.I Have 48 hrs data in RINEX format. Please help me to solve this issue. source https://stackoverflow.com/questions/75385367/how-to-split-a-rinex-file-if-i-need-24-hours-data