Loading multiple files is an Alteryx core capability. But there can be complications...
Don't worry, you will have learned all the tricks by the end of this article.
The standard tools are can handle most cases but certain projects might have more demanding requirements.
It is then necessary to switch to other more powerful techniques. It is therefore important to understand these methods and which ones to use depending on what you are looking for:
This article will also cover the different types or errors that may arise. They may be good indicators to switch to the more advanced ways of loading. Often you think your files are all the same, but they are not even if visually they look identical, and Alteryx raises an error to highlight it.
The four options in order of increasing complexity:
Summary of their strengths/weaknesses :
|
Dynamism |
Simplicity |
Robustness to format changes |
Customization to manage special cases |
1. Manual consolidation of files via the UNION |
X |
XXX |
XXX |
X |
2. Automatic consolidation via the INPUT tool and the "wildcards" |
XXX |
XXX |
X |
X |
3. Automatic consolidation via the DYNAMIC INPUT tool |
XXX |
XX |
X |
XX |
4. Automatic consolidation via a macro package using the INPUT tool |
XXX |
X |
XXX |
XXX |
Description of the case study
To demonstrate the nuances, we will try to consolidate four files that are supposed to be identical. As is often the case in real life, these files are not really identical and in fact have three distinct formats that can cause problems for standard tools.
FinanceData_APAC.xlx and FinanceData_Europe.xlxs are identical according to this format:
FinanceData_USA.xlx has 2 differences. First the column "Production Price" has been renamed to "Manufacturing Price" but has the same numeric format. Second, an additional column "Comment" has been added at the end.
Finance_USA2.xlsx has the same columns but the format of the "Production Price" column is V_String (text) instead of Double (numeric).
We will see how the different methods react to these "small" differences which are the ones that consume our time every day. It is important to master them to know which technique to use.
1 - Manual consolidation of files via the UNION tool
The UNION tool automatically proposes the output format that will consolidate all the files in terms of format and also the number of columns. The name of the output columns is imposed by the first file read.
There are 3 consolidation options:
Here it is the "Auto config by position" option that has been chosen. It allows for the fact that the same column has two different names from one file to another (Production Price / Manufacturing price). It is the name of the column in the first file that will be used. The option “Autoconfig by name” is also very practical when the columns are not in the same order.
Pros:
Cons:
2 - Automatic consolidation via the INPUT DATA tool and the "wildcards"
The wildcards can be used in the file name:
FinanceData_*.xlsx represents all the files that start with "Finance_Data_" and any number of characters and end with ".xlsx". So this scans the directory and brings back our four files. The INPUT DATA will therefore try to load all the files listed, assuming that they all have exactly the same format. Any discrepancy (column or format) leads to non-loading.
Warning, this does not result in errors but in warnings. Here are the typical messages:
Pros:
Cons:
3 - Automatic consolidation via the DYNAMIC INPUT tool
We specify the directory where the files are located.
And on the same principle as the previous case, we specify the characteristics of the files to load.
The wildcards can be used in the file name:
FinanceData_*.xlsx represents all the files that start with "Finance_Data_" and any number of characters and end with ".xlsx". So this scans the directory and brings back our four files. In output, the object brings back all the characteristics of the files including the complete path that we use in the next step.
This tool allows, on the basis of a list of files, to dynamically load all the files via a single object. It is necessary to specify an example file format and a list of paths to these files.
This example file imposes the expected format. Any file that does not respect this first format will be rejected. The DYNAMIC INPUT will therefore try to load all the files in the list. Its format reference is that of the template (here Europe). The ones that are different are discarded.
The OK lines of the other files are still processed and continue in the flow.
Warning, the non-loading for an additional column reason does not result in an error but in a warning. On the other hand, a format change triggers the error. Here are the typical error messages:
Pros:
Cons:
4 - Automatic consolidation via a macro package using the INPUT tool
A batch macro is a combination of tools that are packaged and provided with a list of parameters. It will perform N times the packaged treatment for all the entries of the list.
In our case, we package an INPUT DATA object to which we will provide the list of files to be loaded. The strength of this method is that at each iteration, the macro considers that it is processing one and only one entry without taking into account the history. It is as if we had N times an input with one file and not an input with N files. Thus, we no longer have the problem of format divergence between the 1st and the Nth. Each file is being considered as unique so is necessarily coherent with itself.
And this is where the bonus effect appears. The macro output consolidates the results as the UNION object would and therefore handles all the problems of format or number of distinct columns. This way there is no rejection and the formats are standardized and optimized.
Two things about this macro:
Warning, the non-loading for an additional column reason does not result in an error but in a warning. On the other hand, a format change triggers an error.
Note: The macro properties can be found by navigating to View > Interface Designer, then clicking on the gear icon at the bottom left.
This is where you determine whether the macro will consolidate the data on the basis of the column position (the 1st with the 1st, the 2nd with the 2nd...) or on the basis of column name (whatever the order, it searches for the names and aligns them).
Pros:
Cons:
Tips :
To see if an object is a macro, activate the option "Options" / "User settings" / "Canvas" / "Display macro indicators on tools".
This will display a (+) on the tools that are macros.
Then just right click on it and launch the "Open Macro" option to open a dedicated tab for this macro.
For more information on macros, see the Macros interactive lessons in the Community.
There you go, no more excuses to not dynamically load everything that moves and automatically manage all these cases without having to touch anything from one launch to the next while keeping it simple when you don't need complexity.
Isn't life great? 😊
Enjoy!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.