Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Regex - Deleted all characters after "_" when you have several "_" in a row

Emilie
5 - Atom

Hello there,

 

This is my first time using Regex and I'm a bit lost!

I have a field called [SKU (long format)]. In that field, all rows look similar to this: 110415BS3_________83793006

I only want to keep any digits prior to the first "_", so in the above example, I want to keep 110415BS3

I can't use trim because the number of digits I need to keep can vary.

The closest I've come to what I need is "110415BS3_________" with Regex ^(.+)_.+$ keeping (.+) as an output field.

 

Can you help me with this (somewhat) simple request?

 

Thanks!

 

Emilie

2 REPLIES 2
patrick_digan
17 - Castor
17 - Castor

@Emilie This formula worked in a formula tool for me:

Regex_Replace([Field1],"(.*?)_+.*","$1")

EDIT:

The key is the ? making it non-greedy. In your example, the (.+) is being greedy and grabbing as much it can. So you could make one of 2 changes to your original regex:

^(.+?)_.+$ OR ^([^_]+)_.+$

 You can either add a ? to the .+ to make it non-greedy, or you can change the . to [^_] to have it match anything except the _ symbol. 

Emilie
5 - Atom

Thanks Patrick! Worked like a charm!

Labels