I have a REGEX question.
I am trying to isolate a CODE and AMOUNT from an original RAW_TEXT field.
I thought my data conformed to LN#1 where the CODE always came after a "|" and the amount always started with a $, but that turned out not to be the case. LNs 1-5 show the different variants I have encountered. The code that always starts with "0" and 4 alpha-numeric characters is always followed by the amount.
So far I have tried the following REGEX, but it does not account for all scenarios or the pesky "," in that may or may not appear in the amount.
^.*?(\|0.{4}|0.{4})\s(\d+$|\$\d+)
LN | RAW_TEXT | CODE (Desired Output) | AMOUNT (Desired Output) |
1 | Smoked Colored Plastic, Mounted on Hood|09WAD $327 | 09WAD | 327 |
2 | Located in Steering Wheel 08XHW $28 | 08XHW | 28 |
3 | for Marker Lights and Headlights, Located in 08XHX $2,800 | 08XHX | 2800 |
4 | for Sound Abatement|09HAN 167 | 09HAN | 167 |
5 | Stationary, Painted Grille & Headlight Bezels|09HAT 22CR | 09HAT | 22 |
Solved! Go to Solution.
@hellyars
For your exact input. 😁
@Qiu I left out one variation. CODE can also be all numbers, such as 08781. How do I adjust for that?
hey @hellyars @Qiu
Thought i'd drop a hacky solution into the mix - just to cause a bit of a stir 😁
I just use Getword, and count words, to get the last two words isolated. Replacing Pipe with a space prior.
I would just need to tidy up the results with some data cleansing, but the words you needed isolating are isolated.
Hopefully this helps at all,
TheOC
@TheOC Cool, but this also works on the thousands of lines I want to ignore. That said, I wan to keep this in the toolbox. Thanks.
User | Count |
---|---|
52 | |
27 | |
25 | |
24 | |
21 |