Community Gallery

Notify Moderator

This macro is designed to read in XML files and generate a flattened (or unflattened if preferred) tabular output with hierarchical record id.

This macro is an improved version of the Ultimate XML Parser previously shared in Alteryx Community Gallery.

The "CReW_" prefix was added only for file sorting convenience (when installed into CReW macro pack directory) and this macro is in no way related with or quality checked by CReW team.

This macro appears under the Parse category.

Version History:

v1.0 (2025-11-17): Initial release.

v2.0 (2025-11-18): Added row by row dynamic rename support.

v2.1 (2025-11-23): Prevent a "><" character sequence from being truncated in a CDATA block.

IMPORTANT UPDATE: As of November 17th, 2025, this macro is now properly packaged into an Alteryx Package (YXZP) file, which can be conveniently extracted into an existing CReW Macros installation directory, similar to the previously published CSV Friendly Multi-Input macro.

The most up to date version was uploaded to Community Gallery on November 18th, 2025 and successfully tested to work without errors.

The macro has been optimized to replace self-join with union, as a self-join could take up all the available resources when reading in an XML file with large size.

Benchmark 1: An Alteryx Workflow XML with 14 MB of size is processed in 50 seconds.

Benchmark 2: 70 e-invoice XML files with 300 KB of size each is processed in 32 seconds.

All relevant caveats and information are now present in the macro interface, eliminating the need to refer to the Community Gallery page for additional information.

Same as the CSV Friendly Multi-Input release, a copy of CReW Macros will be downloaded with the name CReW_Macros.zip when you click the Download button for the sake of convenience and preservation.

Old versions of this macro will also be downloaded with the name Old Versions of NeoInfiniTech XML Parse.zip when you click the Download button.

Information regarding the old releases can be found below the line.

----------------------------------------------------------------------------------------------------

IMPORTANT WARNING: Please refrain from using any version of this workflow except for the XML Parser and Flattener (2025-11-08) version if the size of your XML file is greater than 1 MB for the time being, as it might result in a crash due to Python code consuming all available RAM. If you use XML Parser and Flattener (2025-11-08), also refrain from working with big XML files as while it might not result in a crash, it will take up a large portion of temporary disk size and RAM.

Important Note (2025-11-09): If using multiple instances of this macro, please add them into separate control containers and connect these containers in the sequence you would like your workflow to run, as this macro generates intermediary outputs in its working directory due to the use of a Python script.

In addition to all the caveats that apply to the Ultimate XML Parser (linked above), the initial version of this macro published on 2025-11-08 unfortunately shows very slow performance in an XML file with many elements (even if as small as 500 KB) due to a self-join operation that is yet to be optimized. Any feedback for a better approach is welcome.

I might consider XPath filtering syntax implementation (on a relatively basic level compared to the syntax found here for example) if I eventually integrate this macro into CSV Friendly Multi-Input, at that point a new version of that macro will be republished with a new name (such as CSV-XLSX-XML Multi-Input).

Update (2025-11-09): The new update removes the self-join operation, replacing it with a Python process ran externally, which requires a Python installation to be found in PATH. It also automatically attempts to install the packages pyarrow and pandas directly without creating a virtual environment.

During the test phase, I noticed that creating a virtual environment for the purpose of processing a single file is unnecessary (and slows the process), and initiating an integrated Python tool is much slower than writing the intermediary output to a Parquet file and reading the processed Parquet file back through an external Python executable (inspired by Avoiding Alteryx-Python Dependency and Version Chaos with Isolated Environments by @mbarone) running the generated script at runtime (and then removing both the script and the intermediary outputs from the folder).

It is important to note that this macro is not built with the aim of working fast. The macro is built with the aim of parsing every element of an XML file (with the only exception being self-terminating tags without any attributes) and flattening the output.

Differences between Parquet and SQLite versions:

- Parquet will handle smaller (~100 KB) XML files faster, while SQLite version may be better for reading bigger XML (~500 KB) files slightly faster than Parquet.

- Both will probably not be able to read bigger XML files (i.e. a 40 MB XML file pushed my resources to the limit, using %99 of the RAM most of the time), and the workflow will throw an error. I tested this on a computer with 32 GB RAM and i7 12700K processor, if you happen to test this workflow on a computer with higher resources or have an idea to make the workflow run more efficiently, please share your feedback.

As this macro cannot read the big XML files efficiently, a better alternative would be to use the native XML parsing feature of Alteryx for extracting only the necessary elements of the XML file.

The old versions XML Parser and Flattener (2025-11-08).yxmc and XML Parser and Flattener (2025-11-09).yxmc are still available, while the newer versions are published as XML Parser and Flattener_Parquet.yxmc and XML Parser and Flattener_SQLite.yxmc.

The latest versions of this macro (XML Parser and Flattener_Parquet.yxmc and XML Parser and Flattener_SQLite.yxmc) has been tested to run successfully in Alteryx Designer 2025.1.2.142 and Python 3.13 (directly installed from Microsoft Store and registered to PATH as Python.exe). Minimum supported version is 2023.1 due to the utilization of Control Containers, although it was not tested in that version or later (23.x-24.x).

Community Gallery

Looking for Alteryx built Add-Ons?

Description