Alteryx Designer Desktop Discussions

DonMcCarty · ‎06-02-2020

This post got me most of the way there.

https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Use-Fuzzy-Match-to-search-possible-pai...

Also thanks to jdunkerly79, I took his example and just added two small changes to illustrate my need.

My desired output, not the blah blah MNQ ABC company has two possible matches in Field1

Test	Field1
AU - 1100010(150M) - PTH(PAHDDH) -EPL - ABC Company LL	ABC
AU - 1100010(150M) - PTH(PAHDDH) -EPL - MNQ Company LL	MNQ
AU - 1100010(150M) - PTH(PAHDDH) -EPL - TZK Company LL	TZK
us -blah blah - mnq company	MNQ
us - blah blah more more - mnq abc company	MNQ
us - blah blah more more - mnq abc company	ABC

Actual output does not include the last row with the mnq abc company and the abc field1

I'm looking for this type of scenario as I'm trying to identify possible matches between a specific system's user ids and users in outlook address database.

Problem is users change names over the years and it is an entrenched system. But I have all the SMTP addresses and have used other tools to create a list of the real names and every email id they have had over the years.

Now I want to search text within the emails over the years to see possible matches. The final decision is

a) refining my process to get closer matches and

b) eyeballing it and reaching out to users.

Attached is the revised workflow as an example including an isolated Input Data tool that shows my desired output. I just don't know how to get there.

fmvizcaino · ‎06-02-2020

Hi @DonMcCarty ,

You can use a combination of the append + filter tool to get the desired output. It first creates a cartesian join where you will have all the possible combinations and then using the contains functions, it selects all matched cases.

Best,

Fernando Vizcaino

DonMcCarty · ‎06-03-2020

Thank you very much @fmvicaino (sorry tagging not working).

Your suggestion got me most of the way there.

The solution was spot on for my requirements. When I implemented it I got 350 million records.

So I got to thinking about how to pair it down and consider some other known facts about my data and this is what I came up with.

First letter of every system id is the person's first initial
- this rarely changes so I created an integer index from here
Created same integer first name character index in the global list of people
Joined the two lists on this index
- 24.5 million records, down significantly from 350 million
Then, as you suggested, filtered with a contains to slim it down

The join and filter are 73% of my processing time so I'll be looking to get that down.

However, this is performing a valuable service in helping us track down users for a system given the disparate data we have.

The revised example is attached.

Thank you again fmvciano. You solved the problem I presented and then got me on the right track for the rest of my situation.

Alteryx Designer Desktop Discussions

Find Replace Tool all possible matches