Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

RegEx - extract string if NOT preceded by other string

kwieto
8 - Asteroid

I need some help with RegEx:

Let's say there is a string containing text and numbers, for example: 123456 this is text, ABC 56789 this is second text 45, 74930 this is third text

 

I need to extract numbers from that string which are longer than 3 digits, but also only not those which are preceded by ABC.
So the result for the above example should be 123456 and 74930 and NOT 56789 nor 45

 

Excluding shorter numbers is easy with the expression: \d{3,} (I used Tokenize option in RegEX tool)

I also managed to extract the number after ABC only: (?:ABC.?)(\d{3,}) (also Tokenize option), but how to define expression that this number will be skipped?

One additional comment: the seqence of ABC [number] can appear several times in the text, so all instances should be skipped

6 REPLIES 6
caltang
17 - Castor
17 - Castor

Try this:

image.png

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
caltang
17 - Castor
17 - Castor

Since your data is not so straightforward, a clear REGEX may not be the best here in my view. It's much easier to split them accordingly, then apply REGEX on top of what you tokenized out to rows. From there, pivot it back as extra values so that you can see them clearly. 

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
DataNath
17 - Castor
17 - Castor

Hey @kwieto, we can use negative lookups in RegEx to achieve this if you do want to go down that route - I'd just tokenize these and then you'll be able to Cross-Tab them and rejoin to your original data:

 

Before:

 

2020.png

 

After:

 

2021.png

 

RegEx:

 

((?<!ABC\s)\b\d{3,}\b)
kwieto
8 - Asteroid

Thanks, this is exactly what I looked for!

kwieto
8 - Asteroid

That solution is also good, but sometimes the input data doesn't have any delimiter (or comma is not delimiter but part of the text), so the one provided by DataNath is better tuned to the need.

caltang
17 - Castor
17 - Castor

Your example data showed had that so I worked with that assumption in mind. But I learned something new from @DataNath !

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
Labels