Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Inconsistent data format causing Text to Column anomalies

Masond3
8 - Asteroid

Hi Team, 

Need your help 

 

I have data coming in via different externals system and majority (90%) the format of the "External system ID" is in the following format  "ABC_123456_Mason"

 

Consequently i have used the Text to Column functionality  to split the data  into  3 columns anytime it find an "_"  (See below SC)

 

External SystemExternal IDColumn AColumn BColumn C
Mason1ABC_123456_MorganABC123456Mason
Mason2AVI_600291_CV15NRAVI600291CV15NR

 

However there is one external system "Mason3" which does it a bit differently and provides data into the following format  "AAA_123456Morgan". Which means anything after the first "_" is bundled into column B

 

External SystemExternal IDColumn AColumn BColumn C
Mason1ABC_123456_MorganABC123456Mason
Mason2AVI_600291_CV15NRAVI600291CV15NR
Mason3MAS_461598TA44NGSKA461598TA44NG 

 

Just for an external system "Mason3" is there a way to  extract the first 6chars after the "_" and put into column b, any other digits after the 6 digits  go into column c ? 

 

Expected Outcome 

 

Primary SystemExternal IDColumn AColumn BColumn C
Mason1ABC_123456_MorganABC123456Mason
Mason2AVI_600291_CV15NRAVI600291CV15NR
Mason3MAS_461598TA44NGSKA461598TA44NG

 

 

 

Masond3_1-1669380466189.png

 

 

Looking forward to your help

 

Many thanks 

 

 

13 REPLIES 13
OllieClarke
15 - Aurora
15 - Aurora

@Felipe_Ribeir0 it's a very good point. I do tend to overuse RegEx. I guess it depends on the patterns of the data, looking for digits rather than exactly 6 characters could be beneficial, but it could also be incorrect. As with most things in Alteryx, it's good to review multiple approaches and pick the one which suits your use case :)

Felipe_Ribeir0
16 - Nebula

@OllieClarkei agree with you, it is great to see multiple approachs to solve the same problem! And i also agree that your solution does not rely on the character quantities, so it is much more flexible if there is no certainty about this fixed amount of characters between the _.

 

Just to share something cool that motivated me to compare the regex performance with other methods, i recently discovered that cloudfare went out for 27 minutes in 2019 because of a regex expression (poorly written, but still a regex expression).

https://blog.cloudflare.com/details-of-the-cloudflare-outage-on-july-2-2019/

 

Felipe_Ribeir0_0-1669392271919.png

Of course most of us are not dealing with a volume of data that could cause something like this, but even considering that, i found this mindblowing 😅

OllieClarke
15 - Aurora
15 - Aurora

@Felipe_Ribeir0 I've had RegEx tools error before due to excessive backtracking. First time it happened I had no idea what was going on lol

Masond3
8 - Asteroid

@Felipe_Ribeir0  @OllieClarke  These are all both valid scenarios.

Throughout the week i been testing to determine whats the best option. By applying a formula it enbabled me to more control based on a given external id. However @OL Your solution it also very valid. 

 

 

Labels