Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Inconsistent data format causing Text to Column anomalies

Masond3
8 - Asteroid

Hi Team, 

Need your help 

 

I have data coming in via different externals system and majority (90%) the format of the "External system ID" is in the following format  "ABC_123456_Mason"

 

Consequently i have used the Text to Column functionality  to split the data  into  3 columns anytime it find an "_"  (See below SC)

 

External SystemExternal IDColumn AColumn BColumn C
Mason1ABC_123456_MorganABC123456Mason
Mason2AVI_600291_CV15NRAVI600291CV15NR

 

However there is one external system "Mason3" which does it a bit differently and provides data into the following format  "AAA_123456Morgan". Which means anything after the first "_" is bundled into column B

 

External SystemExternal IDColumn AColumn BColumn C
Mason1ABC_123456_MorganABC123456Mason
Mason2AVI_600291_CV15NRAVI600291CV15NR
Mason3MAS_461598TA44NGSKA461598TA44NG 

 

Just for an external system "Mason3" is there a way to  extract the first 6chars after the "_" and put into column b, any other digits after the 6 digits  go into column c ? 

 

Expected Outcome 

 

Primary SystemExternal IDColumn AColumn BColumn C
Mason1ABC_123456_MorganABC123456Mason
Mason2AVI_600291_CV15NRAVI600291CV15NR
Mason3MAS_461598TA44NGSKA461598TA44NG

 

 

 

Masond3_1-1669380466189.png

 

 

Looking forward to your help

 

Many thanks 

 

 

13 REPLIES 13
ShankerV
17 - Castor

Hi @Masond3 

 

From this output extracted, I would recommend you to do the formula tool.

 

Save the output in Column B

If length is >6 for Column B

then Left(column b,6)

else column B

 

Save the output in Column C

If length is >6 for Column B

then right(column b,6)

else column C

 

External SystemExternal IDColumn AColumn BColumn C
Mason1ABC_123456_MorganABC123456Mason
Mason2AVI_600291_CV15NRAVI600291CV15NR
Mason3MAS_461598TA44NGSKA461598TA44NG 

 

 

Many thanks

Shanker V

Felipe_Ribeir0
16 - Nebula

Hi @Masond3 

 

One way of doing this

 

Felipe_Ribeir0_0-1669381317978.png

 

 

 

 

Felipe_Ribeir0_0-1669380890282.png

 

Masond3
8 - Asteroid

@Felipe_Ribeir0  So do you think its best to do the text to columns as first step and then  apply the formula solution afterwards ? 

ShankerV
17 - Castor

@Masond3 

 

I suggest to Split in to columns and then apply the formula.

 

As for huge dataset, it might be Mason#, mason4 ...5 etc.

 

Many thanks

Shanker V

ShankerV
17 - Castor

Hi @Masond3 

 

Please find the use case below.

 

ShankerV_0-1669381370858.png

 

ShankerV_1-1669381400455.png

 

ShankerV_2-1669381411195.png

 

 

Many thanks

Shanker V

Felipe_Ribeir0
16 - Nebula

Hi @Masond3 

 

Fair question!

 

Thinking better, it seems like a good idea to add the necessary _ to the column and use the text to columns after it. I personally would go with this idea:

 

Felipe_Ribeir0_0-1669381742440.png

 

If you need to keep the External ID as it is, you can use the formula to create a new External ID with the _ just to parse it correctly.

 

 

OllieClarke
15 - Aurora
15 - Aurora

Hi @Masond3 

You can use the RegEx tool here in parse mode:

OllieClarke_0-1669382051570.png

The following RegEx will parse 3 columns, A with everything before the first underscore, B with all the digits after that, but before the next underscore (if there), and C with everything after the final underscore (if there) or first non-digit to the end if no underscore

(.+?)_(\d+)_?(.+)

Hope that helps,

 

Ollie

 

 

JoshuaB
Alteryx Product Evangelist
Alteryx Product Evangelist

Me personally, I would use @OllieClarke design pattern as it 1 requires less work to create but it's also more robust. 

Joshua Burkhow | Chief Evangelist @ Alteryx | Follow me on LinkedIn and Twitter
Felipe_Ribeir0
16 - Nebula

Generally the regex path leads to a higher computational cost than solutions using simple components, when the second case is possible. With the Enable Performance Profiling it is possible to see that. Ofc this will not be a problem considering small datasets, but i think that this is something to keep in mind.

 

Felipe_Ribeir0_1-1669388301177.png

 

 

Labels