Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Community is experiencing an influx of spam. As we work toward a solution, please use the 'Notify Moderator' option on the ellipsis menu to flag inappropriate posts.

Weekly Challenges

Solve the challenge, share your solution and summit the ranks of our Community!

Also available in | Français | Português | Español | 日本語
IDEAS WANTED

Want to get involved? We're always looking for ideas and content for Weekly Challenges.

SUBMIT YOUR IDEA

Challenge #37: Parsing a Raw XML File

GeneR
Alteryx Alumni (Retired)

The link to the solution for last challenge #36 is HERE

 

Using raw XML files as input can present some interesting challenges.  The challenge is that the data is nested into the records in a way that requires you to extract it through parsing steps, sometimes drilling many levels into the data (Root and child levels).  Alteryx makes this easier to do using the XML Parsing tool.  We will explore the process in this exercise.

 

Use Case: A company receives customer purchase and shipping data on a weekly level based on web and catalog purchases.  The company would like to analyze their customers and produce a profile by market by SKU.   The challenge is that the data feed contains XML that needs to be parsed in order effectively analyze the data.

 

Objective:  The column called customer_OuterXML contains the data that needs to be parsed into 25 unique fields detailing the customer contact information for both the “Bill To” and “Ship To” attributes.

 

Note: As of 9/11/2019, the Start file and Solution files were edited.  Based on when you complete this challenge, you may see that the solutions posted here may reference a dataset that was previously available.  Posted solutions (as files) using the previous dataset have been replaced with the Alteryx Academy logo to acknowledge that user's contribution that we can no longer share publicly.  

Naledi
7 - Meteor
Spoiler
One solution.jpg

New to xml parsing. There's probably a simpler way to do it. Curious to see what everyone else comes up with. 

markp201
8 - Asteroid

This is what I did but would also be interested if it can be done in fewer steps.

 

Spoiler
Capture.JPG
simon
11 - Bolide

I have seen this exercise before!

simon
11 - Bolide

Ok found it.

Nice clean solution Mark. You can do it with 5 xml parse tools in one stream 🙂

markp201
8 - Asteroid

Yes, it looked familiar to me also.  Seems like there was a training video or class.

TaraM
Alteryx Alumni (Retired)

A solution has been posted

Spoiler
2016-08-22 08_47_13-Alteryx Designer x64 BETA - DataPrep_Parsing XML_Intermediate_Solution.yxmd.png

 

Tara McCoy
SeanAdams
17 - Castor
17 - Castor

My solution was similar to @Naledi

Spoiler
- although the posted solution ( @TaraM / @GeneR ) - I worry that by using hard renames like this it would make the process brittle in the face of change.
- what I did (and it looks like @Naledi did too) is to progressively parse down the tree, and at each stage use a dynamic rename on the dynamically added columns
- This means that even if columns are added to the XML (which happens quite frequently in prod environments) - this solution would gracefully include these fields

Also - by breaking this into smaller subsets of data - you can join it back very easily using a record ID - so each transformation becomes easier to deal with (smaller data sets) the lower down the tree you go.

Thank you for the exercise

Sean

 

estherb47
15 - Aurora
15 - Aurora

Trying to to as little hard renaming as possible makes this a long workflow, and a great exploration of the XML parse tool.

Spoiler
SPOILERimage.png
NicoleJohnson
ACE Emeritus
ACE Emeritus

My solution!

 

Turns out not all parsing tools are created equal, and I do NOT like XML parse as much as RegEx. But I managed... Very similar to other solutions, although kudos to @SeanAdams on the point about the dynamic rename, that is definitely a better long term/adaptable solution.

 

Spoiler
WeeklyChallenge37.JPG