Challenge #37: Parsing a Raw XML File
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
The link to the solution for last challenge #36 is HERE.
Using raw XML files as input can present some interesting challenges. The challenge is that the data is nested into the records in a way that requires you to extract it through parsing steps, sometimes drilling many levels into the data (Root and child levels). Alteryx makes this easier to do using the XML Parsing tool. We will explore the process in this exercise.
Use Case: A company receives customer purchase and shipping data on a weekly level based on web and catalog purchases. The company would like to analyze their customers and produce a profile by market by SKU. The challenge is that the data feed contains XML that needs to be parsed in order effectively analyze the data.
Objective: The column called customer_OuterXML contains the data that needs to be parsed into 25 unique fields detailing the customer contact information for both the “Bill To” and “Ship To” attributes.
Note: As of 9/11/2019, the Start file and Solution files were edited. Based on when you complete this challenge, you may see that the solutions posted here may reference a dataset that was previously available. Posted solutions (as files) using the previous dataset have been replaced with the Alteryx Academy logo to acknowledge that user's contribution that we can no longer share publicly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
New to xml parsing. There's probably a simpler way to do it. Curious to see what everyone else comes up with.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I have seen this exercise before!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Ok found it.
Nice clean solution Mark. You can do it with 5 xml parse tools in one stream 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Yes, it looked familiar to me also. Seems like there was a training video or class.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
My solution was similar to @Naledi
- what I did (and it looks like @Naledi did too) is to progressively parse down the tree, and at each stage use a dynamic rename on the dynamically added columns
- This means that even if columns are added to the XML (which happens quite frequently in prod environments) - this solution would gracefully include these fields
Also - by breaking this into smaller subsets of data - you can join it back very easily using a record ID - so each transformation becomes easier to deal with (smaller data sets) the lower down the tree you go.
Thank you for the exercise
Sean
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Trying to to as little hard renaming as possible makes this a long workflow, and a great exploration of the XML parse tool.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
My solution!
Turns out not all parsing tools are created equal, and I do NOT like XML parse as much as RegEx. But I managed... Very similar to other solutions, although kudos to @SeanAdams on the point about the dynamic rename, that is definitely a better long term/adaptable solution.