Alteryx Designer Desktop Discussions

Mj9715 · ‎10-04-2021

Hi All,

I have multiple data sets like below which I have created using workflows in Alteryx. All datasets are firm and corresponding values.

I have a final output sheet with a master database of unique firm names , against which I need to populate the value corresponding to it.

For eg.,

Scenario 1 : the firm name Adv is only found once in all input datasets so the value 300,000 is the value pasted in the master output file.

Scenario 2: a firm is not found in any dataset, it needs to be left blank

Scenario 3 : firm is found in multiple datasets, the final value needs to be a sum of all values in the multiple dataset corresponding to its name. Eg., App final value needs t0 be (2500000 + 1600000)

Please note that the naming is a bit messy, i.e., one input may have "App limited" and another "App Ltd". This poses a problem when using "group by" in summarize tool. Attached file with input and desired output. Please can someone guide?

Input Dataset 1
Firm	Value
Adv Limited	300,000
App Ltd	2500000
F2F	700000
FD	3500000
GFPL	200000
Smith	1100000
Input Dataset 2
Firm	Value
App Limited	1600000
Craf Ltd	500000
Smith	10,100

patrick_mcauliffe · ‎10-04-2021

@Mj9715 Can you show us what you've tried so far; so we aren't duplicating any work? Maybe something you've tried is just one step away.

Christina_H · ‎10-04-2021

If you have many datasets you might want to use a dynamic input, but basically the process would be to input all your datasets, union them together, group and sum by firm name, then join to the master list. Union any unmatched firms from the master list and you should have what you need.

(Data cleansing tools are included to deal with leading/trailing whitespace, you might not need them for the final data)

Mj9715 · ‎10-04-2021

Hi Christina, @Christina_H I had already built till the summarize tool, however there was one issue I forgot to mentioned. For summarize tool to work, the group by field needs to be exactly the same. For eg it could be App Limited or App Ltd. Any way for summarize tool to do a partial match with just the first word for the group by?

Mj9715 · ‎10-04-2021

@patrick_mcauliffe Patrick, I have union the dataset to input all fields and tried to summarize using group by firm name. The issue I am having as explained below is varying name formats. For eg., one dataset calls it AP Ltd while another one says AP Limited. The master dataset may match one of the two.. however summarize isnt considering them as the same to group by and sum it up.

Christina_H · ‎10-04-2021

@Mj9715 This is really a data cleaning problem then. If you're confident that you will get correct matches by cutting down to the first word only you could do that in a formula tool before matching. Or you could create a list of common variations and either remove them all before matching or replace with a standardized form (e.g. replace Limited / LTD / LTD. with LTD)

It all comes down to what works best with your data.

patrick_mcauliffe · ‎10-06-2021

@Mj9715 Check out the Fuzzy Match tool. This does require some tuning on your part though.

You'll want to add a field identifying each of the sources so when you union them you can still identify where they came from:

Generally, after the Fuzzy Match tool, a Make Groups tool is warranted. This consolidates A:B, B:A matches so there's only one group identifier for many keys.

Also, a good idea to change the case of all text to be the same before running.

Alteryx Designer Desktop Discussions

Designer| Dynamic summarize and v-lookup