Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

String Parsing

aria_rahshid1
6 - Meteoroid

Hello,

 

I need some assistance building out logic that would parse a string with data in the following format:

Project Description Project ID
PS_PSAP002534_SoftPSAP002534
PS_PSEM005740_BlackPSEM00574

 

I want to parse the string to just remove the ID field. The challenge is that the ID field is not always the same character length each time. I think the best approach would be to use regex to find 3 characters and stop at underscore. I need help building out that logic in regex. Please let me know if you an idea.

 

Kind Regards,

 

Rahshid 

8 REPLIES 8
DavidP
17 - Castor
17 - Castor

Sure, you can do it with Regex, but would something like this work?

 

DavidP_0-1583170735145.png

 

Philip
12 - Quasar

Hi Rahshid 

 

Is the ID always between the two underscores and will there always only be the two underscores?

 

If so, you can do this in the Formula tool with this (easier to explain this way than using the REGEX tool):

REGEX_Replace([Field], ".*?_(.+)?_.+", "$1")

The .* finds any character up to the underscore.

The ? character forces minimum match so that search stops on the first underscore.

The (.+)? finds all characters after the first underscore and marks them as the replace string to keep.

The _.+ finds all the characters after the second underscore.

"$1" means to use the first character group within the parentheses as the replacement text.

aria_rahshid1
6 - Meteoroid

@DavidP  Hi David,

 

Thanks for the prompt response. Text to column will not work in this case because the string values are much longer than what I have pasted above. Most of the values are similar to this :

PS_PSAP002589_Name_Name2_Name3_-_TW_Name_Name2_Name3

PS_PSUS005814_Former_ZZZ_Team_Legacy_project_trk

VS_10CFLL_PSEMV0000735_CARGOZZZ_OTY

 

Ideally I would not like to spread it across multiple columns. I would think regex would be the best approach since the ID isn't generally in the same place. Let me know what you think at your earliest convenience.

 

Kind Regards,

 

Rahshid  

PeterP
8 - Asteroid

Hi @aria_rahshid1 ,

 

If you are just trying to parse everything inbetween the Underscores, this should work fine. This expression will remove everything before and after the underscores, and grab everything inbetween the underscores. 

 

 

aria_rahshid1
6 - Meteoroid

Hi @Philip ,

 

This is a good approach, but unfortunately the ID is not necessarily after or before the first underscore . There some values where the ID is embedded after a few underscores and text. An example of this would be :

VS__40FUAM_PSUSV0000849_APPLE_JZ_FZI_PCIDZTS

 

In this case I dont believe your solution will work. Would happen to have an alternative work around? I appreciate your help.

 

Kind Regards,

 

Rahshid 

DavidP
17 - Castor
17 - Castor

So, based on the new data you sent through, it looks like the best pattern we can find to match to is an underscore, followed by some letters, followed by some numbers.

 

So something like this then?

 

DavidP_0-1583173062994.png

 

Philip
12 - Quasar

Regex depends on pattern matching. I'd need to know a dependable pattern. Does the ID always start with PS and no other portion starts with PS?

 

Or try this: if you were looking through a long list of these, how would you know an ID from all the other information?

aria_rahshid1
6 - Meteoroid

Hi @DavidP ,

 

This is amazing. It is exactly what I was looking for. Thanks for your help!

 

Kind Regards,

 

Rahshid Aria

Labels