Hi Community, I am trying to parse information on a few hundred thousand clinical trials that are stored in individual XML files. I've loaded these into Alteryx (one row per XML file) and am mostly getting on well using the XML parse tool with specific child names to grab the data I need into new columns. I am however struggling where there are multiple fields on the same level with the same name. For example, a snip of the xml for one trial: Retinitis Pigmentosa Drug Vitamin A Drug Nutritional Supplement Contains two different interventions that I would like to parse into multiple columns (or rows). There can be multiple interventions (or 0, highest i've seen is 5). Does anyone have an elegant solution to parse out the intervention type & name? I'm faffing around with the xml parse & regex but my knowledge of either is too poor to make it work! Sample (shorter) dataset attached. Many thanks for your help! James Clinical trials xml.yxmd

Parsing multiple similarly named children in XML

Hi Community,

I am trying to parse information on a few hundred thousand clinical trials that are stored in individual XML files. I've loaded these into Alteryx (one row per XML file) and am mostly getting on well using the XML parse tool with specific child names to grab the data I need into new columns.

I am however struggling where there are multiple fields on the same level with the same name. For example, a snip of the xml for one trial:

 <condition>Retinitis Pigmentosa</condition>
  <intervention>
    <intervention_type>Drug</intervention_type>
    <intervention_name>Vitamin A</intervention_name>
  </intervention>
  <intervention>
    <intervention_type>Drug</intervention_type>
    <intervention_name>Nutritional Supplement</intervention_name>
  </intervention>
  <eligibility>

Contains two different interventions that I would like to parse into multiple columns (or rows). There can be multiple interventions (or 0, highest i've seen is 5).

Does anyone have an elegant solution to parse out the intervention type & name?

I'm faffing around with the xml parse & regex but my knowledge of either is too poor to make it work!

Sample (shorter) dataset attached.

Many thanks for your help!

James

Clinical trials xml.yxmd

Parse

Accepted answers

Jean-Balteryx

Hi @JamesFo ,

I attached a workflow doing what I think you want.

Tell me if it's not exactly what you want or if you have questions.

Cheers,

Jean-Baptiste

Clinical trials xml.yxmd

All comments

Jean-Balteryx

Hi @JamesFo ,

I attached a workflow doing what I think you want.

Tell me if it's not exactly what you want or if you have questions.

Cheers,

Jean-Baptiste

Clinical trials xml.yxmd

JamesFo

Hi @Jean-Balteryx,

Amazing - I hadn't realised that the XML parse would just create multiple rows where it found more than one. For some reason I thought it was only returning the first values found! Maybe I should have stuck a record ID on at the start...

Thank you!

James

Quick Links

This months top contributors

atcodedog05 19598

Qiu 15881

binu_acs 15708

MarqueeCrew 13708

apathetichell 13703