Hi,
I have a number of a dataset, which are similar in schema but have different number of columns. The number of datasets number >100 and I wish to pull up a quick summary of them together, so I can see which columns are missing from their corresponding dataset.
I wanted to do something similar to https://community.alteryx.com/t5/Weekly-Challenge/Challenge-72-Compare-Data-Sets-Fields-Columns/td-p..., except I'm not sure if it's still possible? Currently I have a Batch Input Macro pulling and union'ing all the datasets together, but separately, I'm wondering how to adapt the Batch Input Macro (taken from here https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/CS-Macro-Dev-Reading-in-Multiple-Fi...) to generate the Field Info for >100 datasets?
Is it possible?
Solved! Go to Solution.
Hi @R_L,
I think you should combine the Batch Macro with a Directory Tool.
Also make sure to add the "Output File Name as Field" option.
I'll attach the workflow/macro. Let me know what you think.
Best
Alex
Hi, I tried that previously, except I see the following?
Preferably I want to create something to this:
Dataset A | Dataset B | Dataset C | Dataset D |
Field 1 | Field 1 | Field 1 | |
Field 2 | Field 2 | Field 2 | Field 2 |
Field 3 | Field 3 | Field 3 | |
Count of Non-Null Records in Dataset A | Count of Non-Null Records in Dataset B | Count of Non-Null Records in Dataset C | Count of Non-Null Records in Dataset D |
Sorry for misunderstanding it in the first place @R_L.
I made some changes to the workflow/macro.
The output has the following format now:
Dataset 1 | Dataset 2 | Dataset 3 | Dataset 4 | |
Column A | Non null values in Column A in Dataset 1 | Non null values in Column A in Dataset 2 | Non null values in Column A in Dataset 3 | Non null values in Column A in Dataset 4 |
Column B | Non null values in Column B in Dataset 1 | Non null values in Column B in Dataset 2 | Non null values in Column B in Dataset 3 | Non null values in Column B in Dataset 4 |
Column C | Non null values in Column C in Dataset 1 | Non null values in Column C in Dataset 2 | Non null values in Column C in Dataset 3 | Non null values in Column C in Dataset 4 |
It's not exactly your required output, but it contains all required information. It looks like this in the real run:
The Nulls indicate that this column doesn't exist and numbers tell the "non null values".
Workflow / macro looks like this:
Updated workflow/macro attached. What do you think?
Best
Alex
Thanks, that works as well!