Alteryx Designer Desktop Discussions

Masond3 · ‎11-25-2022

Hi Team,

Need your help

I have data coming in via different externals system and majority (90%) the format of the "External system ID" is in the following format "ABC_123456_Mason"

Consequently i have used the Text to Column functionality to split the data into 3 columns anytime it find an "_" (See below SC)

External System	External ID	Column A	Column B	Column C
Mason1	ABC_123456_Morgan	ABC	123456	Mason
Mason2	AVI_600291_CV15NR	AVI	600291	CV15NR

However there is one external system "Mason3" which does it a bit differently and provides data into the following format "AAA_123456Morgan". Which means anything after the first "_" is bundled into column B

External System	External ID	Column A	Column B	Column C
Mason1	ABC_123456_Morgan	ABC	123456	Mason
Mason2	AVI_600291_CV15NR	AVI	600291	CV15NR
Mason3	MAS_461598TA44NG	SKA	461598TA44NG

Just for an external system "Mason3" is there a way to extract the first 6chars after the "_" and put into column b, any other digits after the 6 digits go into column c ?

Expected Outcome

Primary System	External ID	Column A	Column B	Column C
Mason1	ABC_123456_Morgan	ABC	123456	Mason
Mason2	AVI_600291_CV15NR	AVI	600291	CV15NR
Mason3	MAS_461598TA44NG	SKA	461598	TA44NG

Looking forward to your help

Many thanks

ShankerV · ‎11-25-2022

Hi @Masond3

From this output extracted, I would recommend you to do the formula tool.

Save the output in Column B

If length is >6 for Column B

then Left(column b,6)

else column B

Save the output in Column C

If length is >6 for Column B

then right(column b,6)

else column C

External System	External ID	Column A	Column B	Column C
Mason1	ABC_123456_Morgan	ABC	123456	Mason
Mason2	AVI_600291_CV15NR	AVI	600291	CV15NR
Mason3	MAS_461598TA44NG	SKA	461598TA44NG

Many thanks

Shanker V

Felipe_Ribeir0 · ‎11-25-2022

Hi @Masond3

One way of doing this

Masond3 · ‎11-25-2022

@Felipe_Ribeir0 So do you think its best to do the text to columns as first step and then apply the formula solution afterwards ?

ShankerV · ‎11-25-2022

@Masond3

I suggest to Split in to columns and then apply the formula.

As for huge dataset, it might be Mason#, mason4 ...5 etc.

Many thanks

Shanker V

ShankerV · ‎11-25-2022

Hi @Masond3

Please find the use case below.

Many thanks

Shanker V

Felipe_Ribeir0 · ‎11-25-2022

Hi @Masond3

Fair question!

Thinking better, it seems like a good idea to add the necessary _ to the column and use the text to columns after it. I personally would go with this idea:

If you need to keep the External ID as it is, you can use the formula to create a new External ID with the _ just to parse it correctly.

OllieClarke · ‎11-25-2022

Hi @Masond3

You can use the RegEx tool here in parse mode:

The following RegEx will parse 3 columns, A with everything before the first underscore, B with all the digits after that, but before the next underscore (if there), and C with everything after the final underscore (if there) or first non-digit to the end if no underscore

(.+?)_(\d+)_?(.+)

Hope that helps,

Ollie

JoshuaB · ‎11-25-2022

Me personally, I would use @OllieClarke design pattern as it 1 requires less work to create but it's also more robust.

Joshua Burkhow | Chief Evangelist @ Alteryx | Follow me on LinkedIn and Twitter

Felipe_Ribeir0 · ‎11-25-2022

Generally the regex path leads to a higher computational cost than solutions using simple components, when the second case is possible. With the Enable Performance Profiling it is possible to see that. Ofc this will not be a problem considering small datasets, but i think that this is something to keep in mind.

Alteryx Designer Desktop Discussions

Inconsistent data format causing Text to Column anomalies