Skip to main content

Match the string data inside a group - Pandas

I have dataframe like this:

+--------+------------+----------+--------------+--------------+----------------+--------------+--------------+
| company|          id|ann_rtn_dt|share_class_nb|shrhldr_seq_nb|shrhldr_first_nm|shrhldr_mid_nm|shrhldr_sur_nm|
+--------+------------+----------+--------------+--------------+----------------+--------------+--------------+
|SYNTHE01|SYNTHE01_1_1|2022-11-28|             1|             1|            NIEL|        ANDREW|        HOPSON|
|SYNTHE01|SYNTHE01_3_1|2022-11-28|             3|             1|          NICOLE|        CLAIRE|          MORE|
|SYNTHE01|SYNTHE01_1_2|2022-11-28|             1|             2|               N|             C|          MORE|
|SYNTHE01|SYNTHE01_2_1|2022-11-28|             2|             1|            NEIL|        ANDREW|        HOPSON|
|SYNTHE01|SYNTHE01_3_1|2022-11-28|             3|             1|          NICOLE|        CLAIRE|          MORE|
|SYNTHE02|SYNTHE02_1_1|2022-11-28|             1|             1|            MIKE|              |        LOPSON|
|SYNTHE02|SYNTHE02_3_1|2022-11-28|             3|             1|          NIMIKE|              |        LOPSON|
|SYNTHE02|SYNTHE02_1_2|2022-11-28|             1|             2|            MIKE|              |        LOPSON|
|SYNTHE02|SYNTHE02_2_1|2022-11-28|             2|             1|            MIKE|              |        LOPSON|
+--------+------------+----------+--------------+--------------+----------------+--------------+--------------+

The whole dataframe can be grouped 2 distinct company column i.e. SYNTE01 and SYNTHE02.

My use case is to do matching inside the company.

STATUS_1 is set to min of id, when there is full match of shrhldr_first_nm, shrhldr_mid_nm and shrhldr_sur_nm in the grouop.

STATUS_2 is set to min of id, when there is match of first byte of shrhldr_first_nm and shrhldr_mid_nm in the group. And shrhldr_sur_nm matches exactly.

For eg. in COMPANY SYNTHE01, NIEL ANDREW HOPSON in row1 matches with NIEL ANDREW HOPSON in row4. The column STATUS_1 is set to min of id column for both.

For eg. in COMPANY SYNTHE01, the first byte of NICOLE CLAIRE MORE in row2 matches with N C More in row3. The column STATUS_2 is set to min of id column for both.

My output dataframe would look like below:

+--------+------------+----------+--------------+--------------+----------------+--------------+--------------+-------------+-------------+
| company|          id|ann_rtn_dt|share_class_nb|shrhldr_seq_nb|shrhldr_first_nm|shrhldr_mid_nm|shrhldr_sur_nm|     STATUS_1|     STATUS_2|
+--------+------------+----------+--------------+--------------+----------------+--------------+--------------+-------------+-------------+
|SYNTHE01|SYNTHE01_1_1|2022-11-28|             1|             1|            NIEL|        ANDREW|        HOPSON| SYNTHE01_1_1|             |
|SYNTHE01|SYNTHE01_3_1|2022-11-28|             3|             1|          NICOLE|        CLAIRE|          MORE| SYNTHE01_3_1| SYNTHE01_1_2|
|SYNTHE01|SYNTHE01_1_2|2022-11-28|             1|             2|               N|             C|          MORE|             | SYNTHE01_1_2|
|SYNTHE01|SYNTHE01_2_1|2022-11-28|             2|             1|            NEIL|        ANDREW|        HOPSON| SYNTHE01_1_1|             |
|SYNTHE01|SYNTHE01_3_2|2022-11-28|             3|             1|          NICOLE|        CLAIRE|          MORE| SYNTHE01_3_1| SYNTHE01_1_2|
|SYNTHE02|SYNTHE02_1_1|2022-11-28|             1|             1|            MIKE|              |        LOPSON| SYNTHE02_1_1|             |
|SYNTHE02|SYNTHE02_3_1|2022-11-28|             3|             1|          NIMIKE|              |        LOPSON|             |             |
|SYNTHE02|SYNTHE02_1_2|2022-11-28|             1|             2|            MIKE|              |        LOPSON| SYNTHE02_1_1|             |
|SYNTHE02|SYNTHE02_2_1|2022-11-28|             2|             1|            MIKE|              |        LOPSON| SYNTHE02_1_1|             |
+--------+------------+----------+--------------+--------------+----------------+--------------+--------------+-------------+-------------+

We tried this in Pyspark, could not achieve it. We are now trying to do it in Pandas. Please suggest any possible approach. Thank you.



source https://stackoverflow.com/questions/76256407/match-the-string-data-inside-a-group-pandas

Comments

Popular posts from this blog

Prop `className` did not match in next js app

I have written a sample code ( Github Link here ). this is a simple next js app, but giving me error when I refresh the page. This seems to be the common problem and I tried the fix provided in the internet but does not seem to fix my issue. The error is Warning: Prop className did not match. Server: "MuiBox-root MuiBox-root-1" Client: "MuiBox-root MuiBox-root-2". Did changes for _document.js, modified _app.js as mentioned in official website and solutions in stackoverflow. but nothing seems to work. Could someone take a look and help me whats wrong with the code? Via Active questions tagged javascript - Stack Overflow https://ift.tt/2FdjaAW

How to show number of registered users in Laravel based on usertype?

i'm trying to display data from the database in the admin dashboard i used this: <?php use Illuminate\Support\Facades\DB; $users = DB::table('users')->count(); echo $users; ?> and i have successfully get the correct data from the database but what if i want to display a specific data for example in this user table there is "usertype" that specify if the user is normal user or admin i want to user the same code above but to display a specific usertype i tried this: <?php use Illuminate\Support\Facades\DB; $users = DB::table('users')->count()->WHERE usertype =admin; echo $users; ?> but it didn't work, what am i doing wrong? source https://stackoverflow.com/questions/68199726/how-to-show-number-of-registered-users-in-laravel-based-on-usertype

Why is my reports service not connecting?

I am trying to pull some data from a Postgres database using Node.js and node-postures but I can't figure out why my service isn't connecting. my routes/index.js file: const express = require('express'); const router = express.Router(); const ordersCountController = require('../controllers/ordersCountController'); const ordersController = require('../controllers/ordersController'); const weeklyReportsController = require('../controllers/weeklyReportsController'); router.get('/orders_count', ordersCountController); router.get('/orders', ordersController); router.get('/weekly_reports', weeklyReportsController); module.exports = router; My controllers/weeklyReportsController.js file: const weeklyReportsService = require('../services/weeklyReportsService'); const weeklyReportsController = async (req, res) => { try { const data = await weeklyReportsService; res.json({data}) console