Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Finding common rows of data between two separate Excel files (ignoring position)

angeleneveloce
5 - Atom

Hello,

 

I am brand new to Alteryx. I need to construct a workflow that is able to scan through an Excel worksheet and find the same exact rows of data as the rows provided in a separate Excel file's worksheet. The Excel files I am working with contain information on monthly transactions. Excel file #1 is the master file/ general ledger that contains all the transactions that have occurred, whether they have been cleared or not. It is just a historical record of transactions. Excel file #2 contains information from the past month of transactions whose due date for review is approaching or has already approached. Transactions are identified by columns containing account #, customer ID, and transaction amount. If a transaction listed in file #1 is also listed in file #2, then that means the transaction was not cleared yet in that month. If the transaction is NOT listed in file #2, then it means the transaction has been cleared. My task is to identify which transactions listed in master file #1 are also in file #2, and return an output column in file #1 stating "cleared" if the row of data is not present/matching and "not cleared" if the row of data is present/matching. The row of data does not have to be in the same position, as long as it is somewhere in the second file and exactly matching. Below is an example.

 

Before Workflow:

 

Master File #1

Account #Customer IDAmount
10001ab12$100
20001cd34$200
30001ef56$300
40001gh78$400

 

File #2 Monthly Transactions

Account #Customer IDAmount
30001ac93$101
40001gh78$400
20001jn83$823
10013cd34$200

 

After Workflow:

 

Master File #1

Account #Customer IDAmountCleared? (NOT in File #2?)
10001ab12$100Cleared
20001cd34$200Cleared
30001ef56$300Cleared
40001gh78$400Not Cleared

 

The row of data highlighted in red originally listed in file #1 is ALSO LISTED in file #2, even though the row of data is not in the same position. Therefore, the new result/output column in file #1 returns "Not Cleared" in the original row to indicate that the row of data was also found somewhere in file #2, and matching exactly.

I am not sure if I articulated my problem correctly, but if anyone could offer advice or help suggest a workflow that would be awesome. Thank you!

2 REPLIES 2
DavidSkaife
13 - Pulsar

Hi @angeleneveloce 

 

This will work for you: Join the two datasets together on the three fields, and on the join using a formula tool to create the 'Not Cleared' text, then on the left side output (so the data left over from Master File 1) you do the same with another formula tool but use 'Cleared' instead. Finally union the two streams back together!

 

DavidSkaife_0-1656447478011.png

 

I've also attached a workflow for you to see how it works :)

angeleneveloce
5 - Atom

Thank you for such a quick response! Looks good, I will try and implement it with my files tonight!

Labels