Skip to main content

Map two dataframe base on a column and create a new column. Also match partial matching

I have two dataframe

One with codes and values need to map to other dataframe

B = pd.DataFrame({'Code': ['a', 'b', 'c', 'a', 'e','b','b','c'],
                  'Value': ["House with indoor pool", "House with Gray_C_Door", "Big Chandelier",
                            "Window Glass", "Frame Window",'High Column','Wood Raling', 'Window Glass trim']})

Other datframe content lots of data with values and need to make a new column base on dataframe "B" column "Code".

A = pd.DataFrame({'Test': [2,34,12,45,np.nan,34,56,23,56,87,23,67,89,123,np.nan],
                  'Name': [ "House with indoor pool","House with Gray_C_Door",'House with indoor pool and Porch',"Wood Raling",
                           'Window Glass Tinted',"Windows Glass_with",'Big Chandelier', "Frame Window",np.nan,"Window glass","House with indoor pool",'High column with',
                           "Window Glass trim",'Frame Window',"glass Window"],
                 'Value': ["50", "100", "70", "20", "15",'75','50',"10", "10", "34", "5", "56",'12','83',np.nan]})
A.loc[:,'NewName'] = A['Name']

So I'm using the below code to replace A['NewName'].

A['NewName']= A['NewName'].replace(B.set_index('Value')['Code'])

    Test    Name                                Value   NewName
0   2.0000  House with indoor pool              50      a
1   34.0000 House with Gray_C_Door              100     b
2   12.0000 House with indoor pool and Porch    70      House with indoor pool and Porch
3   45.0000 Wood Raling                         20      b
4   NaN     Window Glass Tinted                 15      Window Glass Tinted
5   34.0000 Windows Glass_with                  75      Windows Glass_with
6   56.0000 Big Chandelier                      50      c
7   23.0000 Frame Window                        10      e
8   56.0000 NaN                                 10      NaN
9   87.0000 Window glass                        34      Window glass
10  23.0000 House with indoor pool              5       a
11  67.0000 High column with                    56      High column with
12  89.0000 Window Glass trim                   12      c
13  123.000 Frame Window                        83      e
14  NaN     glass Window                        NaN     glass Window

However, some A['NewName'] are not matching with B['Value'] and doesn't give an exact expected outcome.

Is there a way, I can match those values when It has partial matching with A['NewName'] and give the correct code? I mean for instance when A['NewName'] has "House with indoor pool and Porch" I want to match it with B['Value'] = 'House with indoor pool' and replace it with correct B['Code] = 'a'. I couldn't add that to the data frame B Values column because there are several ways it could change after "House with indoor pool" (for ex: "House with indoor pool_ with big glass door", "House with indoor pool and High railings" etc.)

Is this possible to do it in a map/replace function or any other method?

Thanks in advacne!



source https://stackoverflow.com/questions/71888573/map-two-dataframe-base-on-a-column-and-create-a-new-column-also-match-partial-m

Comments

Popular posts from this blog

ValueError: X has 10 features, but LinearRegression is expecting 1 features as input

So, I am trying to predict the model but its throwing error like it has 10 features but it expacts only 1. So I am confused can anyone help me with it? more importantly its not working for me when my friend runs it. It works perfectly fine dose anyone know the reason about it? cv = KFold(n_splits = 10) all_loss = [] for i in range(9): # 1st for loop over polynomial orders poly_order = i X_train = make_polynomial(x, poly_order) loss_at_order = [] # initiate a set to collect loss for CV for train_index, test_index in cv.split(X_train): print('TRAIN:', train_index, 'TEST:', test_index) X_train_cv, X_test_cv = X_train[train_index], X_test[test_index] t_train_cv, t_test_cv = t[train_index], t[test_index] reg.fit(X_train_cv, t_train_cv) loss_at_order.append(np.mean((t_test_cv - reg.predict(X_test_cv))**2)) # collect loss at fold all_loss.append(np.mean(loss_at_order)) # collect loss at order plt.plot(np.log(al...

Sorting large arrays of big numeric stings

I was solving bigSorting() problem from hackerrank: Consider an array of numeric strings where each string is a positive number with anywhere from to digits. Sort the array's elements in non-decreasing, or ascending order of their integer values and return the sorted array. I know it works as follows: def bigSorting(unsorted): return sorted(unsorted, key=int) But I didnt guess this approach earlier. Initially I tried below: def bigSorting(unsorted): int_unsorted = [int(i) for i in unsorted] int_sorted = sorted(int_unsorted) return [str(i) for i in int_sorted] However, for some of the test cases, it was showing time limit exceeded. Why is it so? PS: I dont know exactly what those test cases were as hacker rank does not reveal all test cases. source https://stackoverflow.com/questions/73007397/sorting-large-arrays-of-big-numeric-stings

How to load Javascript with imported modules?

I am trying to import modules from tensorflowjs, and below is my code. test.html <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>Document</title </head> <body> <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js"></script> <script type="module" src="./test.js"></script> </body> </html> test.js import * as tf from "./node_modules/@tensorflow/tfjs"; import {loadGraphModel} from "./node_modules/@tensorflow/tfjs-converter"; const MODEL_URL = './model.json'; const model = await loadGraphModel(MODEL_URL); const cat = document.getElementById('cat'); model.execute(tf.browser.fromPixels(cat)); Besides, I run the server using python -m http.server in my command prompt(Windows 10), and this is the error prompt in the console log of my browser: Failed to loa...