Parsing XML
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
This is actually related to my previous question about parsing JSON insofar as I'm now looking at the same data in XML format rather than JSON. But I think this may be a data acquisition issue, so new topic! w00t!
I'm looking at the same data as before, now with 100% more XML! XML Source is here:
http://mwo.smurfy-net.de/api/data/mechs
When I point a browser at that URL, it complains that there's no associated style sheet. Chrome specifically says "This XML file does not appear to have any style information associated with it. The document tree is shown below." Fine, but that tells me I have an otherwise well-formed XML document.
When I point Alteryx at this data source, whether live on the web or a downloaded copy, the XML Parser tool complains that there is no valid XML to parse. And, in fact, when I look at the payload, I see a whole bunch of key value pairs in a really long string. So I think I must be doing something wrong in the data acquisition process. I just don't know what. Is there some pre-processing I should do before feeding this to the XML parser? Perhaps I just don't understand XML as well as I think I do.
Workflow attached.
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
After playing with it a bit more, I've discovered that part of the issue is that there is a newline at the top of the document so that the XML declaration is actually on line 2. Looks like most XML parsing tools, not just Alteryx, are not big fans of having the XML declaration anywhere other than line 1 even if all preceeding lines are empty. Well, I don't know how to fix that using the download tool, but I did fix it in my downloaded copy. This has the added benefit of not needing the XML parsing tool since the Input Data tool seems to know how to parse out what I need form the file directly. YAY!
However, there are a TON of newlines and an ungodly amount of whitespace. I've tried stipping whitespace, newlines and quotes using Multi-Field formulas. That got me fairly far, but there's still a bunch left over. On the theory that the leftovers are in one of the several "unknown" columns (which I don't need anyway), I dropped those columns using a Select tool. Still no good.
Then I realized I'm being silly because several of the fields I DO need have nested XML in them. Now I have a headache.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Ok. New update.
Looks like the issue here arises because the input data tool doesn't like nested XML. The 'Details' object contains several child objects, some of which I want, the rest of which is uneeded. Regardless, the Input Data tool appears to strip out the contents of the 'Details' object so there is just no XML there to be parsed. Time to play with the Input Data configuration, I guess.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
YAY! Sorted it out! Turns out that parsing nested XML is kind of a pain in the **bleep**, especially when the data is also very wide (i.e. many columns). I am now able to get what I need from a local XML file. Now to figure out how to get it via HTTP. Once I'm done, I'll post the final workflow with comments.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @CharleyMcGee,
When I hit that API location using your module, I get JSON output, and am able to parse it by separating into 2 lots.
I firrst parse the entire JSON into 2 fields, then split T2C before separating the details from the rest to parse them individually. Take a look and it may give you an example of how to go when you decide on how you would like it structured. See attached.
Kane
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
