Reconcile multiple datasets
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello,
I have multiple datasets coming from different sources, and I need to check that all the 39 fields match for each ID in all the data sources, and fields that have a different value should be flagged. Initially, ii thought I would have to join on the unique ID and 'amount' column. however in my real data sets, there are 39 fields, do I need to do this 39 times?
If anyone could help me with a more efficient way to do this. That would be amazing. I have attached two datasets that are similar to my real datasets.
Thank you.
DATA SET 1 :
ID | AMOUNT | Rating | Type | flag |
1 | 1000 | A+ | Coupon | No |
2 | 1000 | AA++ | Coupon | Yes |
3 | 3000 | BB | Coupon | yes |
4 | 6000 | CC | Coupon | yes |
5 | 6000 | BB | Coupon | yes |
6 | 7000 | A- | Coupon | No |
7 | 9000 | n/A | Coupon | No |
8 | 9000 | AAA | Coupon | No |
9 | 9000 | Bu | Coupon | No |
10 | 9000 | Bu | Credit | No |
DATA SET 2:
ID | AMOUNT | Rating | Type | flag |
1 | 1000 | A+ | Coupon | No |
2 | 1000 | B | Coupon | Yes |
3 | 3000 | BB | Credit | yes |
4 | 6000 | CC | Coupon | yes |
5 | 6000 | BB | Coupon | yes |
6 | 7000 | A- | Credit | No |
7 | 9000 | n/A | Coupon | No |
8 | 9000 | AAA | Coupon | No |
9 | 9000 | Bu | Coupon | No |
10 | 9000 | Bu | Credit | No |
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
There are many posts on this subject. Here are a few:
How To: Compare Data from Two Data sets
Community > Designer > Browse Knowledge > How To: Compare Data from Two Data sets
CReW Delta macro: This macro allows a user to input two files that are expected to be equal. The user can select which fields are to be compared
https://gallery.alteryx.com/#!app/CReW-Delta--Alpha-/5b41141b826fd3116445fee8
Comparing Data from Two Sets of Data, Calling out Differences
Compare 2 Data Sets
https://community.alteryx.com/t5/Engine-Works-Blog/Compare-2-Data-Sets/ba-p/88853
Data Check Between Two Sources
Community > Getting Started > Alteryx Use Cases > Data Check Between Two Sources
https://community.alteryx.com/t5/Alteryx-Use-Cases/Data-Check-Between-Two-Sources/ta-p/270082
Best way to identify unique rows of data
Comparing data between two different servers
Chris
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thank you. Yes I did see those, but these are for two data sets only. I have multiple data sets over 20 that need reconciling. Which is where i am stuck.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Do your 20 data sets have any type of ordering? How would you perform the task manually? I'm guessing you would start with dataset #1 then compare it to dataset #2, then what would be your next step, manually?
If, for example, you were comparing row #5, column C across 20 datasets. If the value in C5 was:
dataset 1: C5 = 123
dataset 2: C5 = 444
dataset 3: C5 = 123
dataset 4: C5 = 444
Manually, would these be denoted as 4 separate values, or only 2 separate values? How would you order the datasets before you started your comparison?
After you identify the steps for a manual task, you should have the logic for your workflow.
Chris
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@barkat ,
Would this work? Getting a table that tells you what ID and field is different?
Attached is the workflow.
If this solution helped, please mark as a solution for other users benefit.
Thanks,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
hi @barkat
Assuming that the only required output is to detect exceptions (i.e. where there are mismatches), I think the most efficient way is to use a single Join Tool.
if you can make sure that the columns in both files are identical in sequence then you can also use the option "Join by Record Position". However in this case, I have just selected all the available fields.
The L and R outputs will point you to the exceptions where there are not identical matches between 2 books.
