I want to extract the exact value+unit from the 8 rows below. But my code is not working properly. For example, my code extracts in row 4 "1000 2 ounce" instead of only "2 ounce". Same issue with row 5 and 6.
- includes chalkboard labels, marker & measuring cup capacity s | 135.2 ounce capacity
- 33% thicker - 12 glass spice - 3 cubic inch capacity
- nolopau 1.53 gallon capacity lon glass jar,
- [1000 2 ounce capacity compostable condiment souffluide bagasse
- bia cordon bleu inc 900712 8.5 ounce capacity porcelain
- bekith 15 9 liter capacity glass jars,
- echtpower bento box, 1.5 milliliter lunch box with handle
- aislor plastic straining 0.50 gallon capacity lon pitcher, dishwasher
Outcome for each row: 153.2 ounce 3 cubic inch 1.53 gallon 2 ounce 8.5 ounce 9 liter 1.5 milliliter 0.50 gallon
I tried:
def extract_c(df):
pattern = (\d+.\d+\s*ounce|\d+.\d+\s*cubic\s*inch|\d+.\d+\s*gallon|\d+.\d+\s*liter|\d+.\d+\s*milliliter)'
return df.str.extract(pattern)
Can anyone support me in extracting these 8 rows in 1 line?
source https://stackoverflow.com/questions/74838517/regex-extract-digit-word
Comments
Post a Comment