This workflow is designed to universally parse XML files into a tabular structure, populating (almost) every information found in the XML file without asking the user which element to parse, although with some caveats.
Things to look out for:
- Change the input file using the XML View, as not doing so may result in the configuration being lost.
- After changing the input file definition through the XML View, make sure to correctly configure the encoding of the XML file. Default setting in the workflow is UTF-8. While building an additional process to automatically detect the code page from the declaration is possible, it is not implemented because:
1. The workflow would have to be converted into a macro, preventing me from sharing this project as a regular workflow, as a PoC of sorts (at least for the first version).
2. There might be UTF-16 XMLs, which might break the process if they do not contain BOM.
3. Not all XML files may have an encoding declaration.
- If your XML file consists of a single line containing the entire data and the file is bigger than 16 MB, please consider splitting it into multiple parts using a text editor that can handle large files, as not doing so will result in the data being truncated, since AMP Engine only supports single rows with up to 16 MB of size.
- This workflow cannot parse the notations (tags starting as i.e. <!ENTITY), although it does handle comments (<!--...-->) and CDATA. This is because the notations can include nested tags and despite my best efforts, I was not able to isolate and remove those correctly, so expect errors when dealing with XML files containing notations.
- The following intermediary characters are used to split the tags and make additional necessary adjustments: "‽", "⸘". Although these are characters that are very rarely used, please ensure they do not exist in your XML data before processing the data.
- Tab characters between tags and before & after strings are automatically removed.
- Multi-line values between tags are preserved.
As an example, if you have an indented JSON string between an XML tag, the multi-line structure will be preserved but you will lose the indentation.
- Any declarations at the beginning of the data are automatically omitted from the data (<?...?>).
- This workflow may not be able to handle tags that are spanning multiple lines.
- Attributes are also included in the output if they are present in the XML file, expect the data to not follow a line-by-line structure.
- The resulting table may need further processing to obtain the desired output.
It is my future plan that this workflow will be developed further and eventually be integrated to a macro I previously published,
CSV Friendly Multi-Input.
Please feel free to share your feedback and develop your own solutions using this tool.
Update (2025-10-15): An updated version of the workflow which can handle multi-line contents properly.