Recently we started looking into converting XML / XAF file in Alteryx. We had some success however started running into severate performance issues due to field bloating.
To explain the issue. when choosing journal as XML Child Element Name it would look as follows:
The above XML string would then be converted into the following Alteryx data:
journal | jrnID | desc | jrnTp | transaction | transaction_2 |
010 | Sales Ledger | A | |||
010 | Sales Ledger | A |
This itself would not immediately be an issue. However, the issue occurs when working with a large XML file. In a recent example a specific journal had a maximum of [transaction_29882]. This means the Input tool output would have that amount of fields for all the imported rows (even those who did not actually have that amount of transactions).
journal | jrnID | desc | jrnTp | transaction | transaction_2 | ... | transaction_29882 |
010 | Sales Ledger | A | NULL | ||||
010 | Sales Ledger | A | NULL | ||||
020 | Purchase Ledger | B |
If you have a total of 8 million entries for the journal XML Child Element Name which at some point has 29.882 transactions it will then amount to 239 billion cells of data (the majority being NULL values). This of course takes a very long time to import.
Question:
Is there a way to import such XML files more effectively?
It would be helpful if you provided an example of what you have tried, as well as an example that could be copied and pasted into a workflow instead of an image of the structure.
This should be possible and may require some iterative/batch macros, but you'll need to be a bit more intentional about the way you interact with the structure.