Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Efficient Way to Find Multi-Column Matches Between Two Recordsets

ShaneGraham
7 - Meteor

I submitted this once, but I don't see it listed in the discussions, so forgive me if this ends up a duplicate.


The attached workflow, pictured below in Figure 1, takes in two inputs (using purely fictitious data) , the first is a shorter list of applicants (hundreds/thousands of rows), the second is much longer list of existing accounts (millions of rows).  The workflow needs to search through the list of existing accounts checking to see if any of the applicants has an existing account based on multiple criteria, including first initial, last name, email address, phone number, address, etc.  The output is the list of matched records and the boolean flags indicating which columns matched.

 

Figure 1Figure 1

The comparison between each set of records is accomplished using an Append Fields to match every row in the first recordset with every row of the second recordset, allowing a Formula Tool (Figure 2) with a number of boolean comparisons to flag matching rows.  This is an inefficient, brute force method, and I know there must be a more efficient approach, perhaps using a batch or iterative macro.

 

Figure 2Figure 2

I would very much appreciate anyone's thoughts (and examples) of how to do this better.  Thanks.

 

2 REPLIES 2
MarqueeCrew
20 - Arcturus
20 - Arcturus

@ShaneGraham,

 

 

LOTS of ways to skin this cat.  Depending on the use case, I would consider either use of Calgary tools or creation of search keys using md5_hash.  I see no need for iterative or batch macros and see this taking more time to test, than to construct.  FIND REPLACE should only be used with 32,000 (ish) records or less in the 'S' anchor.  You will need to use a JOIN.  In fact, you'll need multiple join tools.  That's why I'm thinking Calgary.

 

If you transpose the data and create 1 row of data per lookup key (Key + LookupName + LookupValue), you can get all matches in a single run.  Then evaluate the results.

 

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
Latez
6 - Meteoroid

Any example workflow of this?

Labels