Alteryx Designer Desktop Discussions

kwieto · ‎12-12-2023

I need some help with RegEx:

Let's say there is a string containing text and numbers, for example: 123456 this is text, ABC 56789 this is second text 45, 74930 this is third text

I need to extract numbers from that string which are longer than 3 digits, but also only not those which are preceded by ABC.
So the result for the above example should be 123456 and 74930 and NOT 56789 nor 45

Excluding shorter numbers is easy with the expression: \d{3,} (I used Tokenize option in RegEX tool)

I also managed to extract the number after ABC only: (?:ABC.?)(\d{3,}) (also Tokenize option), but how to define expression that this number will be skipped?

One additional comment: the seqence of ABC [number] can appear several times in the text, so all instances should be skipped

caltang · ‎12-12-2023

Try this:

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/

caltang · ‎12-12-2023

Since your data is not so straightforward, a clear REGEX may not be the best here in my view. It's much easier to split them accordingly, then apply REGEX on top of what you tokenized out to rows. From there, pivot it back as extra values so that you can see them clearly.

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/

DataNath · ‎12-12-2023

Hey @kwieto, we can use negative lookups in RegEx to achieve this if you do want to go down that route - I'd just tokenize these and then you'll be able to Cross-Tab them and rejoin to your original data:

Before:

After:

RegEx:

((?<!ABC\s)\b\d{3,}\b)

kwieto · ‎12-12-2023

Thanks, this is exactly what I looked for!

kwieto · ‎12-12-2023

That solution is also good, but sometimes the input data doesn't have any delimiter (or comma is not delimiter but part of the text), so the one provided by DataNath is better tuned to the need.

caltang · ‎12-12-2023

Your example data showed had that so I worked with that assumption in mind. But I learned something new from @DataNath !

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/

Alteryx Designer Desktop Discussions

RegEx - extract string if NOT preceded by other string

Re: Unable to get an output

Re: Extracting the list of sheet names across mult...

Example workflow for setting up a custom list to u...

Re: Firm names parse

Re: Help with Multi-Row formula