This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
This is an extract from a government xml budget document I am trying to parse. It is publicly available
I have run into this problem before. But, I am deeper into the data and the prior solution is not available
I have attached both a snippet from the XML and a simplified workflow that depicts my challenge.
The master element is ModificationItems. I need to extract data located on 3 lines:
Title on line # 4,
Manufacturer's Name on line #139 (which I dropped from the workflow since it is not a problem to pull in the real workflow), and
Total Cost and its children on line #169.
But there is a problem. Total Cost is used multiple times throughout. I specifically want only the Total Cost that appears under Total Cost. I want to ignore the other total Cost items. While others look the same, I need specifically this one.
I tried to walk it down from Procurement to Total Cost to Total Cost to ignore the other Total Cost Items under other children of Procurement. It is not working. As you can see, I am getting multiple row outputs. It should be one row.
To get only the TotalCost that occurs under TotalCost, at line 169, I modified the configuration of your last XML Parse Tool, and added a Filter Tool.
For your final XML Parse Tool, I configured it so the Field with XML Data is set to TotalCost_OuterXML2, which is the output of your previous XML Parse Tool, where you extract the outside TotalCost layer. I set the Tool to Parse a Specific Child Name, TotalCost and to Return Child Values.
This configuration makes it so the XML Parse Tool is looking for TotalCost from the already parsed outer TotalCost. The result is that only one of the rows in the data stream has data values resulting from this process. The rest have nulls because there is not an inner TotalCost in the XML.
I then added and configured a Filter Tool to select the rows without nulls for the children of the inner TotalCost. The result is a single row, with the parsed data starting at row 169.
I've attached your workflow with the modifications I made to get the inner Total Cost Values. Please let me know if this solution does not work for you.