Niche request here: does anyone have experience with the ETL of very wide SPSS-derived Excel data sets in Alteryx please?
We have questionnaire results in Excel files (derived from SPSS) which are over 16k columns wide, so in Alteryx just looking at a union tool, for example, can take minutes, and then there's the added complication of mapping the column header codes to their correct aliases...
We have a very simple working solution, which just Unions the Excel files and then where we deselect all the unused columns via a Select tool, but I feel we could do better, maybe through transposing? We have left the mapping to Tableau, where we've just assigned aliases.
Any advice would be much appreciated. Thanks so much in advance.
Transpose, as early as possible. Probably Input > RecordID > Transpose. In order to build that part of the workflow go to Options > User Settings > Edit User Settings > Advanced (tab) > Disable Auto Configure. This will stop Alteryx from checking, preparing and possibly displaying the metadata every time you click. 16k columns is a lot of metadata. Use of a batch macro here, if you have many files, will speed up input a lot.
https://knowledge.alteryx.com/index/s/article/The-Ultimate-Input-Data-Flowchart-1583459854309 for reference.
You can then use a summarize tool to get a list of the fields (Name column) and build a dynamic procedure to replace (effectively renaming) the field names you want and filter out the ones you don't want. Once you have the required data trimmed, you can cross-tab, but work out what the optimal output is first before going back to a table format.
User | Count |
---|---|
106 | |
85 | |
76 | |
54 | |
40 |