This post got me most of the way there.
Also thanks to jdunkerly79, I took his example and just added two small changes to illustrate my need.
My desired output, not the blah blah MNQ ABC company has two possible matches in Field1
Test | Field1 |
AU - 1100010(150M) - PTH(PAHDDH) -EPL - ABC Company LL | ABC |
AU - 1100010(150M) - PTH(PAHDDH) -EPL - MNQ Company LL | MNQ |
AU - 1100010(150M) - PTH(PAHDDH) -EPL - TZK Company LL | TZK |
us -blah blah - mnq company | MNQ |
us - blah blah more more - mnq abc company | MNQ |
us - blah blah more more - mnq abc company | ABC |
Actual output does not include the last row with the mnq abc company and the abc field1
I'm looking for this type of scenario as I'm trying to identify possible matches between a specific system's user ids and users in outlook address database.
Problem is users change names over the years and it is an entrenched system. But I have all the SMTP addresses and have used other tools to create a list of the real names and every email id they have had over the years.
Now I want to search text within the emails over the years to see possible matches. The final decision is
a) refining my process to get closer matches and
b) eyeballing it and reaching out to users.
Attached is the revised workflow as an example including an isolated Input Data tool that shows my desired output. I just don't know how to get there.
Solved! Go to Solution.
Hi @DonMcCarty ,
You can use a combination of the append + filter tool to get the desired output. It first creates a cartesian join where you will have all the possible combinations and then using the contains functions, it selects all matched cases.
Best,
Fernando Vizcaino
Thank you very much @fmvicaino (sorry tagging not working).
Your suggestion got me most of the way there.
The solution was spot on for my requirements. When I implemented it I got 350 million records.
So I got to thinking about how to pair it down and consider some other known facts about my data and this is what I came up with.
The join and filter are 73% of my processing time so I'll be looking to get that down.
However, this is performing a valuable service in helping us track down users for a system given the disparate data we have.
The revised example is attached.
Thank you again fmvciano. You solved the problem I presented and then got me on the right track for the rest of my situation.
User | Count |
---|---|
17 | |
15 | |
15 | |
8 | |
6 |