Reconcile multiple datasets

Question

Hello,

I have multiple datasets coming from different sources, and I need to check that all the 39 fields match for each ID in all the data sources, and fields that have a different value should be flagged. Initially, ii thought I would have to join on the unique ID and 'amount' column. however in my real data sets, there are 39 fields, do I need to do this 39 times?

If anyone could help me with a more efficient way to do this. That would be amazing. I have attached two datasets that are similar to my real datasets.

Thank you.

DATA SET 1 :

IDAMOUNTRatingTypeflag11000A+CouponNo21000AA++CouponYes33000BBCouponyes46000CCCouponyes56000BBCouponyes67000A-CouponNo79000n/ACouponNo89000AAACouponNo99000BuCouponNo109000BuCreditNo

DATA SET 2:

IDAMOUNTRatingTypeflag11000A+CouponNo21000BCouponYes33000BBCredityes46000CCCouponyes56000BBCouponyes67000A-CreditNo79000n/ACouponNo89000AAACouponNo99000BuCouponNo109000BuCreditNo

Book2.xlsx

Book1.xlsx

DawnDuong · Answer

hi @barkat

Assuming that the only required output is to detect exceptions (i.e. where there are mismatches), I think the most efficient way is to use a single Join Tool.

if you can make sure that the columns in both files are identical in sequence then you can also use the option "Join by Record Position". However in this case, I have just selected all the available fields.

The L and R outputs will point you to the exceptions where there are not identical matches between 2 books.

ARComm_for_Barkat.yxmd

CarliE · Answer

@barkat ,

Would this work? Getting a table that tells you what ID and field is different?

Attached is the workflow.

If this solution helped, please mark as a solution for other users benefit.

Thanks,

Reconcile multiple datasets.yxmd

ChrisTX · Answer

Do your 20 data sets have any type of ordering?  How would you perform the task manually?  I'm guessing you would start with dataset #1 then compare it to dataset #2, then what would be your next step, manually?

If, for example, you were comparing row #5, column C across 20 datasets.  If the value in C5 was:

dataset 1: C5 = 123

dataset 2: C5 = 444

dataset 3: C5 = 123

dataset 4: C5 = 444

Manually, would these be denoted as 4 separate values, or only 2 separate values?  How would you order the datasets before you started your comparison?

After you identify the steps for a manual task, you should have the logic for your workflow.

Chris