Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Regex - Deleted all characters after "_" when you have several "_" in a row

Emilie
5 - Atom

Hello there,

 

This is my first time using Regex and I'm a bit lost!

I have a field called [SKU (long format)]. In that field, all rows look similar to this: 110415BS3_________83793006

I only want to keep any digits prior to the first "_", so in the above example, I want to keep 110415BS3

I can't use trim because the number of digits I need to keep can vary.

The closest I've come to what I need is "110415BS3_________" with Regex ^(.+)_.+$ keeping (.+) as an output field.

 

Can you help me with this (somewhat) simple request?

 

Thanks!

 

Emilie

2 REPLIES 2
patrick_digan
17 - Castor
17 - Castor

@Emilie This formula worked in a formula tool for me:

Regex_Replace([Field1],"(.*?)_+.*","$1")

EDIT:

The key is the ? making it non-greedy. In your example, the (.+) is being greedy and grabbing as much it can. So you could make one of 2 changes to your original regex:

^(.+?)_.+$ OR ^([^_]+)_.+$

 You can either add a ? to the .+ to make it non-greedy, or you can change the . to [^_] to have it match anything except the _ symbol. 

Emilie
5 - Atom

Thanks Patrick! Worked like a charm!

Labels
Top Solution Authors