This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I have a series of Hierarchal XML files in a directory. When I try to process these files using the Directory and Dynamic Input tool, I am running out of memory - I have this as an open issue with Alteryx - this method is off the table for now.
As a workaround, I would like to iterate through each file in the directory, processing them one at a time.
I am a experienced developer, but new to Alteryx.
I would like to have a source workflow that creates a list of files. I then want it to read each file and pass a data stream to a macro file for processing. That macro will then pass back a result stream. The base function can then do final processiing on this.
Can anyone provide me some high level strategies for performing this? My first question is how to iterate through each file?
I have attached a module (built in alteryx 10.6) which iterates through 13 XML files. You can open up the macro and insert your own workflow.
Please note the input file within the macro reads the file as a non-delimeted CSV and the rebuilds the XML and parses it out using our summarize and xml parse tool. This may help with your memory issue.
Correct each file is processed individually in a batch macro. Once it finishes one record it then processes the next.
All the parameters will come from outside the batch macro (directory tool). In this case we are updating the input file path with those coming from the directory tool.
In the action tool we have chosen 'update value' this allows us to update part of the tool it is attached too. You can add more interface tools (this is a category) and paramatize other parts of the workflow if you want to as well.
You do not have to have a macro output.
I would recommend filtering this file when you read them in from the directory tool. This way you can deal with that file seperately.
Jordan. If the batch macro is being run once for each file in the directory, why do you need to concat files function based on filename in the batch macro? Should it not be assumed that there is only one file it is processing and therefore does not need to be grouped?
When the XML file gets read, each line in the file becomes a row of data. The Summary tool is used to concatenate the rows back together in one string. The Group By on the file name keeps the XML strings identifiable when in the output of the macro.
If one needs to parse through XML files that do not have similar structures like your weather example, the Remove unwanted fields will need to be moved to outside the macro, with the Formula tool receiving the output from the macro.
Hi. I'm new to Alteryx so sorry if this seems a silly question. Can you tell me how you added the call from the "XML Sample Workflow" workflow to the "XML Parse Macro". From my googling people seem to use the runner macro, but I tried to recreate your "XML Sample Workflow" using the runner, but the xml data didn't come out in the final results.
@nbt1032 To further explain what @David-Carnes mentioned: the input tool in the batch macro reads the XML file as a "csv" file with no delimiter, so that it will not remember the XML structure. This results in each line of text in the XML file being treated separately. The concat brings these lines back together into one long XML string which can then be parsed by the XML Parse tool
@crookie74 "XML Sample Workflow" is the outer/parent workflow which has the "XML Parse Macro" batch macro embedded in it. If you make a new workflow and add a "Control Parameter" interface tool, then it will be saved as a batch macro. After you saved this batch macro, you can then open a new workflow ("XML Sample Workflow") in which you can import the batch macro by using right-click > Insert