Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Find Replace Tool all possible matches

DonMcCarty
5 - Atom

 

This post got me most of the way there.

https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Use-Fuzzy-Match-to-search-possible-pai...

 

Also thanks to jdunkerly79, I took his example and just added two small changes to illustrate my need.

 

My desired output, not the blah blah MNQ ABC company has two possible matches in Field1

TestField1
AU - 1100010(150M) - PTH(PAHDDH) -EPL - ABC Company LLABC
AU - 1100010(150M) - PTH(PAHDDH) -EPL - MNQ Company LLMNQ
AU - 1100010(150M) - PTH(PAHDDH) -EPL - TZK Company LLTZK
us -blah blah - mnq companyMNQ
us - blah blah more more - mnq abc companyMNQ
us - blah blah more more - mnq abc companyABC

 

Actual output does not include the last row with the mnq abc company and the abc field1

 

I'm looking for this type of scenario as I'm trying to identify possible matches between a specific system's user ids and users in outlook address database.

 

Problem is users change names over the years and it is an entrenched system.  But I have all the SMTP addresses and have used other tools to create a list of the real names and every email id they have had over the years.

 

Now I want to search text within the emails over the years to see possible matches.  The final decision is

a) refining my process to get closer matches and

b) eyeballing it and reaching out to users.

 

Attached is the revised workflow as an example including an isolated Input Data tool that shows my desired output.  I just don't know how to get there.

 

 

2 REPLIES 2
fmvizcaino
17 - Castor
17 - Castor

Hi @DonMcCarty ,

 

You can use a combination of the append + filter tool to get the desired output. It first creates a cartesian join where you will have all the possible combinations and then using the contains functions, it selects all matched cases.

fmvizcaino_0-1591154335672.png

 

 

Best,

Fernando Vizcaino

DonMcCarty
5 - Atom

Thank you very much @fmvicaino (sorry tagging not working).

Your suggestion got me most of the way there.

The solution was spot on for my requirements.  When I implemented it I got 350 million records.

So I got to thinking about how to pair it down and consider some other known facts about my data and this is what I came up with.

 

  1. First letter of every system id is the person's first initial
    • this rarely changes so I created an integer index from here
  2. Created same integer first name character index in the global list of people
  3. Joined the two lists on this index
    • 24.5 million records, down significantly from 350 million
  4. Then, as you suggested, filtered with a contains to slim it down

The join and filter are 73% of my processing time so I'll be looking to get that down.

However, this is performing a valuable service in helping us track down users for a system given the disparate data we have.

 

The revised example is attached.

 

Thank you again fmvciano.  You solved the problem I presented and then got me on the right track for the rest of my situation.

 

Labels
Top Solution Authors