Combining 2 Data Sets and Highlighting Differences in Output
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello –
I am trying to compare an old set of data with a new set of data and want the output to call out any additions or removals to the data, and have the impacted cells highlighted.
The Column headers will always be in the same order.
There is no primary key that can be used, but if you combine the ID A, ID B, and ID C columns then you can make that combination a primary key. (I think this is done with the Join tool).
ID A and ID A Name are tied to each other, ID B and ID B Name are tied to each other, and ID C and ID C Name are tied to each other.
Below is an example I created of the two files along with what I would like the output to look like. Most importantly I would like the impacted data to be highlighted and have a Status Column be added to call out the differences, whether the information has been deleted or added. *I was not sure how to highlight a text box on here, so instead I changed the font to Red*
Old Data:
Overall ID | ID A | ID A Name | ID B | ID B Name | ID C | ID C Name |
123 | 343 | Dog | 9988 | Cat | 777 | Cow |
1234 | 343 | Dog | 777 | Cow | 565 | Frog |
12345 | 9988 | Cat | 565 | Frog | 333 | Toad |
123456 | 333 | Toad | 565 | Frog | 999 | Pig |
New Data:
Overall ID | ID A | ID A Name | ID B | ID B Name | ID C | ID C Name |
123 | 343 | Dog | 999 | Pig | 777 | Cow |
1234 | 343 | Dog | 777 | Cow | 565 | Frog |
12345 | 343 | Dog | 565 | Frog | 333 | Toad |
1 | 343 | Dog | 999 | Pig | 565 | Frog |
Output:
Overall ID | ID A | ID A Name | ID B | ID B Name | ID C | ID C Name | Status |
123 | 343 | Dog | 999 | Pig | 777 | Cow | New |
12345 | 343 | Dog | 565 | Frog | 333 | Toad | New |
123456 | 333 | Toad | 565 | Frog | 999 | Pig | Removed |
1 | 343 | Dog | 999 | Pig | 565 | Frog | New |
Thank you very much, I appreciate all of your help in advance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
These related posts should help:
How To: Compare Data from Two Data sets
Compare every value in every field from 2 different inputs
Community > Designer > Browse Knowledge > How To: Compare Data from Two Data sets
Data Check Between Two Sources
2018 Excellence Awards Entry
Community > Getting Started > Alteryx Use Cases > Data Check Between Two Sources
https://community.alteryx.com/t5/Alteryx-Use-Cases/Data-Check-Between-Two-Sources/ta-p/270082
Compare two files without a unique key - not able to Join
https://community.alteryx.com/t5/Alteryx-Designer-Discussions/String-Comparison/m-p/388279#M71158
https://gallery.alteryx.com/#!app/CReW-Delta--Alpha-/5b41141b826fd3116445fee8
Comparing Data from Two Sets of Data, Calling out Differences
Compare 2 Data Sets
https://community.alteryx.com/t5/Engine-Works-Blog/Compare-2-Data-Sets/ba-p/88853
How To Update Fields Selected With Control Parameter
see below: The purpose of the macro is to pull in two different sheets and help identify the differences between the two
Best way to identify unique rows of data
Comparing data between two different servers
Compare field values Left vs Right
Iterative Data Comparison
Chris
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @adrian1713!
@ChrisTX's article suggestions are excellent. I wanted to chime in to address your question regarding how you can create a table with highlighted cells.
The trick here is to build your logic so that each row has a highlight flag for each ID. You can do this with the formula tool based on whatever logic applies to your use case. In this example, I used "if [ID A] != [Right_ID A] then "Y" else "N" endif" for the updated records and "Y" for any new or removed records. You may need to adjust the logic based on your use case.
Within the Table tool, you can use the "Column Rules" to apply the highlighting logic to each column. The highlight fields can then be unselected as they are not needed in the final table. You can also apply any other formatting to make the table look as desired.
Here's an example configuration:
Note that the columns names in bold have specific rules applied. These rules must be added one-by-one for each column.
Output:
Let us know if this helps.
Thanks,
Deb
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thank you so much to the both of you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Is there a way for the table to not show the rows that do not have any highlighted cells in them?
The real data contains hundreds of rows, and I want to only focus on rows that contain changes to the data.
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Using the data from the second screenshot provided by @ddiesel :
Use Transpose tool, key field Overall ID
Then a Filter tool: Contains([Name],"Highlight") and [Value] = "Y"
Then a Unique tool: unique values on field Overall Id
Then join the output above back to your original data stream on Overall ID
Chris
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
There are so many ways to do things in Alteryx! @ChrisTX's solution will work. An alternate approach would be to add a filter tool after your join. Here's how that would be configured:
The formula in the custom filter would look something like this:
[ID A Highlight]="Y" or [ID B Highlight]="Y" or [ID C Highlight]="Y"
Hope this helps,
Deb
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Helpful for my Solution building too!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Done building my solution using your reference workflow, Thanks a lot :) Will reach out if further help needed...
