Hi Community,
I am trying to parse information on a few hundred thousand clinical trials that are stored in individual XML files. I've loaded these into Alteryx (one row per XML file) and am mostly getting on well using the XML parse tool with specific child names to grab the data I need into new columns.
I am however struggling where there are multiple fields on the same level with the same name. For example, a snip of the xml for one trial:
<condition>Retinitis Pigmentosa</condition>
<intervention>
<intervention_type>Drug</intervention_type>
<intervention_name>Vitamin A</intervention_name>
</intervention>
<intervention>
<intervention_type>Drug</intervention_type>
<intervention_name>Nutritional Supplement</intervention_name>
</intervention>
<eligibility>Contains two different interventions that I would like to parse into multiple columns (or rows). There can be multiple interventions (or 0, highest i've seen is 5).
Does anyone have an elegant solution to parse out the intervention type & name?
I'm faffing around with the xml parse & regex but my knowledge of either is too poor to make it work!
Sample (shorter) dataset attached.
Many thanks for your help!
James