Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Parsing XML

CharleyMcGee
6 - Meteoroid

This is actually related to my previous question about parsing JSON insofar as I'm now looking at the same data in XML format rather than JSON.  But I think this may be a data acquisition issue, so new topic!  w00t!

 

I'm looking at the same data as before, now with 100% more XML!  XML Source is here:

http://mwo.smurfy-net.de/api/data/mechs

 

When I point a browser at that URL, it complains that there's no associated style sheet.  Chrome specifically says "This XML file does not appear to have any style information associated with it. The document tree is shown below."  Fine, but that tells me I have an otherwise well-formed XML document.

 

When I point Alteryx at this data source, whether live on the web or a downloaded copy, the XML Parser tool complains that there is no valid XML to parse.  And, in fact, when I look at the payload, I see a whole bunch of key value pairs in a really long string.  So I think I must be doing something wrong in the data acquisition process.  I just don't know what.  Is there some pre-processing I should do before feeding this to the XML parser?  Perhaps I just don't understand XML as well as I think I do.

 

Workflow attached.

5 REPLIES 5
CharleyMcGee
6 - Meteoroid

After playing with it a bit more, I've discovered that part of the issue is that there is a newline at the top of the document so that the XML declaration is actually on line 2.  Looks like most XML parsing tools, not just Alteryx, are not big fans of having the XML declaration anywhere other than line 1 even if all preceeding lines are empty.  Well, I don't know how to fix that using the download tool, but I did fix it in my downloaded copy.  This has the added benefit of not needing the XML parsing tool since the Input Data tool seems to know how to parse out what I need form the file directly.  YAY!

 

However, there are a TON of newlines and an ungodly amount of whitespace.  I've tried stipping whitespace, newlines and quotes using Multi-Field formulas.  That got me fairly far, but there's still a bunch left over.  On the theory that the leftovers are in one of the several "unknown" columns (which I don't need anyway), I dropped those columns using a Select tool.  Still no good.  

 

Then I realized I'm being silly because several of the fields I DO need have nested XML in them.  Now I have a headache.

CharleyMcGee
6 - Meteoroid

Ok.  New update.

 

Looks like the issue here arises because the input data tool doesn't like nested XML.  The 'Details' object contains several child objects, some of which I want, the rest of which is uneeded.  Regardless, the Input Data tool appears to strip out the contents of the 'Details' object so there is just no XML there to be parsed.  Time to play with the Input Data configuration, I guess.

CharleyMcGee
6 - Meteoroid

YAY!  Sorted it out!  Turns out that parsing nested XML is kind of a pain in the **bleep**, especially when the data is also very wide (i.e. many columns).  I am now able to get what I need from a local XML file.  Now to figure out how to get it via HTTP.  Once I'm done, I'll post the final workflow with comments.

KaneG
Alteryx Alumni (Retired)

Hi @CharleyMcGee,

 

When I hit that API location using your module, I get JSON output, and am able to parse it by separating into 2 lots.

 

I firrst parse the entire JSON into 2 fields, then split T2C before separating the details from the rest to parse them individually. Take a look and it may give you an example of how to go when you decide on how you would like it structured. See attached.

 

Kane

CharleyMcGee
6 - Meteoroid

As promised, here is the workflow that gets me what I need.  @KaneG's workflow really does all the heavy lifting.  I had to add one more pivot and then drop it out to Postgres.  That works WAAAYYY better than the swill I was coming up with.

 

Many thanks to @KaneG!

Labels