Alteryx Designer Desktop Discussions

SharkeyNZ · ‎10-09-2023

I regularly get data sets that I need to join (on 1 or more fields) and the first thing I want to know is how much of the data is common and will join, and how much will fall out of the process.
The fall out may be bad data that needs cleaning or is legitimately dissimilar. For Example, take two data sets to compare:

Data Set One				Data Set 2
ROLE	LEVEL	TEAM	Classification	Type	Category	Experience	Scrum
Analyst	Senior	Full Stack	Big Data	Java SE	Cloud Native	Junior	Digital
Analyst	Junior	Waterfall	Blockchain	Java SE	Java, BA	Junior	Niche
DEV	Mid	Standard	Design Systems Designer	PM	Big Data	Intermediate	Standard
DEV	Senior	Full Stack	Design Systems Engineer	Tester	Cloud	Senior	Standard
Java	Junior	Waterfall	Cloud Native	Data Scientist	Azure	+5 Years	Digital
Java	Mid	Standard	Cloud Native	Data Modeller	Big Data	Senior	Waterfall
Project Manager	Senior	Standard	.Net	Developer	Design Systems Designer	+5 Years	Niche

I want to make a macro where I can take the two data sets, select the Fields that I want to compare, so for the data above, I want a selection similar to a Join tool, where I select fields:

Data Set One - ROLE <-> Data Set 2 - Type
Date Set One - LEVEL <-> Date Set 2 - Experience
Date Set One - TEAM <-> Date Set 2 - Scrum
Date Set One - Classification <-> Date Set 2 - Category

Each field value comparison will be independent. I've seen examples of how to create a single list of the field names, but not one that allows me to select and match up both sets of field names. Has any one got a solution here?

caltang · ‎10-09-2023

Not very sure I follow you, but correct me if I am wrong:

Basically you will ingress two files into Alteryx, and you want to see what data can join and what can't join based on selected fields?

In this case, I have a few points to make:

If you have a many-to-many join, it will give you many more records than needs be. You'll need to use a Tile Tool to get them a specific / unique number per field name. It's easy if it's just 1 field, but having many fields will be a bit tricky.
Why a Macro specifically? It can be a workflow in itself, and you can package it as a Macro later no?
Can you provide an expected output or even a mock-up design that you ideally want?

The above will help.

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/

OllieClarke · ‎10-10-2023

Hey @SharkeyNZ I can think of a way to build this macro, but it would involve typing in fields in the order of joining into 2 text boxes, and then updating the join tool in a nested batch macro. You could also maybe have as many drop downs as there could be join clauses (with some left blank) to set join clause 1, join clause 2 etc.
Neither one of these 'solutions' would be user friendly, both would involve nested batch macros.

Another 'solution' (attached) is to transpose all the data and append it. You can then see where there are matching values. This approach will be slow with large datasets though.

SharkeyNZ · ‎10-16-2023

Thanks @caltang.
In terms of output, I'd like for example:

Evaluating Fields names (Selection 1 - Selection 2)	Matching values in both sets	Value from selection 1	Value from selection 2
Level - Experience	Junior	Junior	Junior
Level - Experience	Senior	Senior	Senior
Level - Experience	Not Matched	Mid
Level - Experience	Not Matched		Intermediate
Level - Experience	Not Matched		+5 Years
Team - Scrum	Waterfall	Waterfall	Waterfall
Team - Scrum	Standard	Standard	Standard
Team - Scrum	Not Matched	Full Stack
Team - Scrum	Not Matched		Digital
Team - Scrum	Not Matched		Niche

Part of the goal here is I want to see what's different between multiple groups of two fields, that are "supposed" to have the same data set of unique values, and if possible to create a mapping file so that I can join other elements of data sets.

I want a macro, as this problem comes up often with the data I get, and the field names change constantly. So while I can do this in a follow, I'd need to update it each time.

SharkeyNZ · ‎10-16-2023

Thanks @OllieClarke. As you stated, the approach is slow with large datasets if not cut down to unique values. I will try out your other suggestion.

Alteryx Designer Desktop Discussions

Multi select and match List in Macro (Like Join tool)

No data in configuration pane of tool immediately ...

Re: How to separate IDs from the text

Re: How to dynamically name files without creating...

Re: Iterative canvas execution with different file...

Re: Need Help in Data manipulation