Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Complex Cleansing and Mapping - anyone brave enough??.....

fiorano
8 - Asteroid

Hi Everyone,

 

I have some pretty messed up data which I need to map to a master look up.  I have attached a file that holds what the raw data would look like, an example of the look up and what the required output would be.

 

A screenshot of the desired output below.  As you can see, we are getting some really crappy Raw Codes and Descriptions!!!  I dont really know where to begin to get this data cleansed.

 

Can anyone offer any starting points or sample workflows? I would be eternally grateful...

 

cleansing_challenge.PNG

 

Many many thanks!!

 

 

 

6 REPLIES 6
MarqueeCrew
20 - Arcturus
20 - Arcturus

@fiorano,

 

Not so hard with pattern matching (and a little practice/experience).  I (personally) didn't start using Regular Expressions until I became active in the community.  Once I did start, they changed my approach to strings.  Watch tutorials on RegEx and soak it in!

 

Capture.png

 

I've attached the workflow and will review the steps with their results:

 

  1. INPUT the data and metadata.
  2. RECORDID the input data (for ordering purposes and potential mis-match identification.
  3. JOIN for exact matches (11/24)
  4. FORMULA for empty codes
    IIF(IsEmpty([RAW CODE]),GetWord([RAW Desc], 0),[RAW CODE])
    Basically, if the code is empty then take the first word out of the description.  Follow this with an update to the description to take the first word out if it is what was found for the code.
    IIF([RAW CODE]=GetWord([RAW Desc], 0),replace([RAW Desc], [RAW CODE]+' ',''),[RAW Desc])
  5. JOIN for exact matches (0/13)
  6. FORMULA for reformatting numeric codes
    REGEX_Replace([RAW CODE], "(\d{3})(\d{3})(\d{7})", '$1-$2-$3')
    Format numeric (13 bytes) with dashes
  7. JOIN for exact matches (10/13)
  8. FORMULA to remove punctuation from codes
    REGEX_Replace([RAW CODE], "[[:punct:]]", '')
  9. JOIN for exact matches (3/3)
  10. UNION your data together
  11. SORT
  12. SELECT to remove fields and rename them
  13. BROWSE

Based upon the sample data this is the workflow necessary.  If you end up with more "unjoined-left" data, you'll need additional rules constructed or you'll have a small set of outliers to manually correct.

 

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
jrgo
14 - Magnetar

Hi @fiorano

 

The attached workflow is, I believe, successfully matching all 24 records in your sample table. However, a lot of what @MarqueeCrew solution includes are some excellent practices to pre-process/cleanse your values in preparation for matching, especially when you run this through a much larger table.

fiorano
8 - Asteroid

Hi,

 

thanks @MarqueeCrew and @jrgo !!!!! Both are amazing solutions which look to cover almost all data quality issues that we are facing at the moment.

 

Huge thanks to both of you!

fiorano
8 - Asteroid

Hi @MarqueeCrew,

 

Just a quick query on one of your expressions if I may.

 

Can you explain what

 

REGEX_Replace([RAW CODE], "(\d{3})(\d{3})(\d{7})", '$1-$2-$3')

is doing with an example?

 

Many thanks!

Fiorano

MarqueeCrew
20 - Arcturus
20 - Arcturus
Group 1 is 3 digits. Followed by
Group 2 is 3 digits. Followed by
Group 3 is 7 digits. With nothing left.

A group is a pattern inside of parentheses.

Replace contents with group1-group2-group3.

When the input pattern doesn’t match this, don’t apply the mask.

Does that make more sense?
Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
fiorano
8 - Asteroid

Hi @MarqueeCrew

 

This does indeed make sense - thanks for your reply.

 

Fiorano

Labels
Top Solution Authors