Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Inconsistent data format causing Text to Column anomalies

Masond3
8 - Asteroid

Hi Team, 

Need your help 

 

I have data coming in via different externals system and majority (90%) the format of the "External system ID" is in the following format  "ABC_123456_Mason"

 

Consequently i have used the Text to Column functionality  to split the data  into  3 columns anytime it find an "_"  (See below SC)

 

External SystemExternal IDColumn AColumn BColumn C
Mason1ABC_123456_MorganABC123456Mason
Mason2AVI_600291_CV15NRAVI600291CV15NR

 

However there is one external system "Mason3" which does it a bit differently and provides data into the following format  "AAA_123456Morgan". Which means anything after the first "_" is bundled into column B

 

External SystemExternal IDColumn AColumn BColumn C
Mason1ABC_123456_MorganABC123456Mason
Mason2AVI_600291_CV15NRAVI600291CV15NR
Mason3MAS_461598TA44NGSKA461598TA44NG 

 

Just for an external system "Mason3" is there a way to  extract the first 6chars after the "_" and put into column b, any other digits after the 6 digits  go into column c ? 

 

Expected Outcome 

 

Primary SystemExternal IDColumn AColumn BColumn C
Mason1ABC_123456_MorganABC123456Mason
Mason2AVI_600291_CV15NRAVI600291CV15NR
Mason3MAS_461598TA44NGSKA461598TA44NG

 

 

 

Masond3_1-1669380466189.png

 

 

Looking forward to your help

 

Many thanks 

 

 

13 REPLIES 13
ShankerV
17 - Castor

Hi @Masond3 

 

From this output extracted, I would recommend you to do the formula tool.

 

Save the output in Column B

If length is >6 for Column B

then Left(column b,6)

else column B

 

Save the output in Column C

If length is >6 for Column B

then right(column b,6)

else column C

 

External SystemExternal IDColumn AColumn BColumn C
Mason1ABC_123456_MorganABC123456Mason
Mason2AVI_600291_CV15NRAVI600291CV15NR
Mason3MAS_461598TA44NGSKA461598TA44NG 

 

 

Many thanks

Shanker V

Felipe_Ribeir0
16 - Nebula

Hi @Masond3 

 

One way of doing this

 

Felipe_Ribeir0_0-1669381317978.png

 

 

 

 

Felipe_Ribeir0_0-1669380890282.png

 

Masond3
8 - Asteroid

@Felipe_Ribeir0  So do you think its best to do the text to columns as first step and then  apply the formula solution afterwards ? 

ShankerV
17 - Castor

@Masond3 

 

I suggest to Split in to columns and then apply the formula.

 

As for huge dataset, it might be Mason#, mason4 ...5 etc.

 

Many thanks

Shanker V

ShankerV
17 - Castor

Hi @Masond3 

 

Please find the use case below.

 

ShankerV_0-1669381370858.png

 

ShankerV_1-1669381400455.png

 

ShankerV_2-1669381411195.png

 

 

Many thanks

Shanker V

Felipe_Ribeir0
16 - Nebula

Hi @Masond3 

 

Fair question!

 

Thinking better, it seems like a good idea to add the necessary _ to the column and use the text to columns after it. I personally would go with this idea:

 

Felipe_Ribeir0_0-1669381742440.png

 

If you need to keep the External ID as it is, you can use the formula to create a new External ID with the _ just to parse it correctly.

 

 

OllieClarke
15 - Aurora
15 - Aurora

Hi @Masond3 

You can use the RegEx tool here in parse mode:

OllieClarke_0-1669382051570.png

The following RegEx will parse 3 columns, A with everything before the first underscore, B with all the digits after that, but before the next underscore (if there), and C with everything after the final underscore (if there) or first non-digit to the end if no underscore

(.+?)_(\d+)_?(.+)

Hope that helps,

 

Ollie

 

 

JoshuaB
Alteryx Product Evangelist
Alteryx Product Evangelist

Me personally, I would use @OllieClarke design pattern as it 1 requires less work to create but it's also more robust. 

Joshua Burkhow | Chief Evangelist @ Alteryx | Follow me on LinkedIn and Twitter
Felipe_Ribeir0
16 - Nebula

Generally the regex path leads to a higher computational cost than solutions using simple components, when the second case is possible. With the Enable Performance Profiling it is possible to see that. Ofc this will not be a problem considering small datasets, but i think that this is something to keep in mind.

 

Felipe_Ribeir0_1-1669388301177.png

 

 

Labels