Hi Community,
I am trying to parse information on a few hundred thousand clinical trials that are stored in individual XML files. I've loaded these into Alteryx (one row per XML file) and am mostly getting on well using the XML parse tool with specific child names to grab the data I need into new columns.
I am however struggling where there are multiple fields on the same level with the same name. For example, a snip of the xml for one trial:
<condition>Retinitis Pigmentosa</condition>
<intervention>
<intervention_type>Drug</intervention_type>
<intervention_name>Vitamin A</intervention_name>
</intervention>
<intervention>
<intervention_type>Drug</intervention_type>
<intervention_name>Nutritional Supplement</intervention_name>
</intervention>
<eligibility>
Contains two different interventions that I would like to parse into multiple columns (or rows). There can be multiple interventions (or 0, highest i've seen is 5).
Does anyone have an elegant solution to parse out the intervention type & name?
I'm faffing around with the xml parse & regex but my knowledge of either is too poor to make it work!
Sample (shorter) dataset attached.
Many thanks for your help!
James
Solved! Go to Solution.
Hi @JamesFo ,
I attached a workflow doing what I think you want.
Tell me if it's not exactly what you want or if you have questions.
Cheers,
Jean-Baptiste
Hi @Jean-Balteryx,
Amazing - I hadn't realised that the XML parse would just create multiple rows where it found more than one. For some reason I thought it was only returning the first values found! Maybe I should have stuck a record ID on at the start...
Thank you!
James
@JamesFo ,
Yes, the XML tool will return everything that matches the value you specify, it won't stop to the first one.
You're welcome !
Cheers,
Jean-Baptiste