I have some XML, pasted below and also attached.
<suite> <testcase id="001" kind="bvt"> <inputs> <arg1>4</arg1> <arg2>7</arg2> </inputs> <expected>11.00</expected> </testcase> <testcase id="002" kind="drt"> <inputs> <arg1>9</arg1> <arg2>6</arg2> </inputs> <expected>15.00</expected> </testcase> </suite>
I'd like Alteryx to ingest this as:
id kind arg1 arg2 expected
1 bvt 4 7 11
2 drt 9 6 15
I have a solution attached and pictured below:
That subsequent step would need to be repeated, I think, if the XML nesting was deeper.
Is there a solution that will work no matter how deep the nesting?
Solved! Go to Solution.
Thank you @MikeN. @DavidP, here's my workflow (with your macro and my original example.xml file renamed to example.csv):
And here's the result:
Hi @MikeN,
The Parse XML macro only flattens the file and extracts the data, which means that regardless of how many layers of xml, you only need 1 xml parsing tool.
You then have to tidy up the data.
@JasonFriedman, glad that you got the macro working, now try the attached workflow.
Ok @DavidP, looks like I'm getting closer. The second tool in the picture below is your macro. I notice that your image has a mark on the macro, my image does not.
Don't worry about the + in the Parse XML macro - it's just a setting in my user config that tells me which tools are macros.
Can you share the output from the CrossTab tool - I'd like to understand the error in the select tool.
No output in the CrossTab tool, the only tool with output is the macro:
In your workflow the "example.xml" file is the original file I attached? These are my settings for that file:
ok, I see the problem.
If you're loading the file as an xml file in the input data tool, it gets parsed as an xml file. We don't want this.
As I said in my earlier posts, if you're using the parse xml macro, you should load the file as a csv with delimeter set as \0 and the box unchecked that says First Row Contains Field Names (as per the example workflow that I posted).
This step is really important. Without it, nothing else works.
The parse XML macro expects a field [Field_1] that contains all the XML. If you do this bit right, the rest should fall into place.
Thank you DavidP, works great now.
Follow-up ... I added the five tools in your final workflow to the macro you provided. I now have:
What threw me was, in your image, your input file was named example.xml. I see now that I can select from the configuration window the type of input file ... I'm not constrained by the file extension. My comment to the good people at Alteryx corporation is the File Format option value should not have a grey background, as grey implies it cannot be edited.
Thank you @JoeL , that solutions errors out with my actual data file:
Select (6) The field "Field_1" is not contained in the record.
Summarize (5) The field "Field_1" is not contained in the record {Action=Concat).
Summarize (5) RecordInfo::CreateRecord: A record was created with no fields.
Input Data (1) 617460 records were read from "C:\Users\jason\Downloads\myfile.xml"
Designer x64 Finished running XML Flat Parse.yxmd in 1.0 seconds with 2 errors and 1 warning
Actually, @DavidP 's solution also does not work with my actual data file ... it does not error out, but returns only one record, with each field containing thousands of values separated by commas.
Designer x64 Started running C:\Users\jason\Downloads\parse xml file 2.yxmd at 03/03/2020 10:22:00
Input Data (12) 617460 records were read from "C:\Users\jason\Downloads\myfile.xml"
Parse XML (11) Tool #328: accountid: "IFT000347_24,IFT000349_24,IFT000350_24,IFT000351_24,IFT000352_24,IFT000358_24,IFT000372_24,IFT000..." was truncated
...
Parse XML (11) Tool #328: type: "D,D,P,D,D,D,D,D,D,D,D,P,D,D,P,P,D,D,P,P,D,D,D,D,D,D,D,D,P,D,D,D,P,D,D,P,D,P,P,D,D,P,P,D,D,P,D,D,D..." was truncated
Browse (10) 1 records
Output Data (14) 1 records were written to "C:\Users\jason\Downloads\output.csv"
Designer x64 Finished running parse xml file 2.yxmd in 31.5 seconds with 46 field conversion errors and 1 warning
@JasonFriedman, the "Field_1" error can be fixed in the input tool. You need to deselect "First row contains field names" to get the XML to come out in Field_1. you can also rename your XML column to Field_1 using a select tool.