I need some help with RegEx:
Let's say there is a string containing text and numbers, for example: 123456 this is text, ABC 56789 this is second text 45, 74930 this is third text
I need to extract numbers from that string which are longer than 3 digits, but also only not those which are preceded by ABC.
So the result for the above example should be 123456 and 74930 and NOT 56789 nor 45
Excluding shorter numbers is easy with the expression: \d{3,} (I used Tokenize option in RegEX tool)
I also managed to extract the number after ABC only: (?:ABC.?)(\d{3,}) (also Tokenize option), but how to define expression that this number will be skipped?
One additional comment: the seqence of ABC [number] can appear several times in the text, so all instances should be skipped
Solved! Go to Solution.
Since your data is not so straightforward, a clear REGEX may not be the best here in my view. It's much easier to split them accordingly, then apply REGEX on top of what you tokenized out to rows. From there, pivot it back as extra values so that you can see them clearly.
Hey @kwieto, we can use negative lookups in RegEx to achieve this if you do want to go down that route - I'd just tokenize these and then you'll be able to Cross-Tab them and rejoin to your original data:
Before:
After:
RegEx:
((?<!ABC\s)\b\d{3,}\b)
Thanks, this is exactly what I looked for!
That solution is also good, but sometimes the input data doesn't have any delimiter (or comma is not delimiter but part of the text), so the one provided by DataNath is better tuned to the need.
Your example data showed had that so I worked with that assumption in mind. But I learned something new from @DataNath !