Hi all,
I am brand new, taking the courses, and learning fast. Great tool!
I have two data sets with millions of rows that should contain the same set of data. I am trying to find the differences.
For both sets I concatenated three columns to create an unique ID column.
I want to compare the two unique ID columns to see the % match / mismatch.
How do I do that? Is there a way to see the % match between the two? The below data table is an example of what I am talking about.
Customer Name | Customer ID | Number of purchases | Concat 1 |
John | 1 | 10 | John110 |
Bob | 2 | 11 | Bob211 |
Sally | 3 | 12 | Sally312 |
Customer Name | Customer ID | Number of purchases | Concat 2 |
John | 1 | 10 | John110 |
Bob | 2 | 11 | Bob211 |
Jenny | 4 | 13 | Jenny413 |
You want to know how many columns are match for each row? For example, 2 out of 3 columns match=> 66.7%, 1 out of 3 columns match=>33.3%, correct?
If so, you can transpose table and compare them by JOIN and identify difference by formula tool...so on. Please see attached workflow.
By the way, I suppose you create unique column by concat first three column, however, I recommend to put a separator like semi-colon ; vertical bar| etc..
For example, if there is another record like:
Customer Name: John, Customer ID: 11, Number of purchases: 0
Then, concat value would be John110, which is identical to first record, and not a unique one! By using separator, they look like;
John;1;10
John;11;0
These values become unique.
Thank you for the response. I truly appreciate it. I will look it over.
@danniboy3000 I've seen many options to compare data sets. If you only need to compare 2 columns, a Join or Transpose may be easiest. If you ever need to compare many columns, see links below.
Community > Support > Knowledge > Designer > How To: Compare Data from Two Data sets
https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/How-To-Compare-Data-from-Two-Data-s...
I want to develop something that would highlight the differences in the names.
https://community.alteryx.com/t5/Alteryx-Designer-Discussions/String-Comparison/m-p/388279#M71158
CReW Delta macro is in our laboratory and looking for feedback. This macro allows a user to input two files that are expected to be equal. The "Delta" output will include any differences.
https://gallery.alteryx.com/#!app/CReW-Delta--Alpha-/5b41141b826fd3116445fee8
Comparing Data from Two Sets of Data, Calling out Differences
https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Comparing-Data-from-Two-Sets-of-Data-C...
use join tool to find newly inserted or deleted rows. See blog...
https://community.alteryx.com/t5/Engine-Works-Blog/Compare-2-Data-Sets/ba-p/88853
Compare 2 Data Sets
https://community.alteryx.com/t5/Engine-Works-Blog/Compare-2-Data-Sets/ba-p/88853
The purpose of the macro is to pull in two different sheets and help identify the differences between the two
https://community.alteryx.com/t5/Alteryx-Designer-Discussions/How-To-Update-Fields-Selected-With-Con...
Community > Getting Started > Alteryx Use Cases > Data Check Between Two Sources
2018 Excellence Awards Entry
https://community.alteryx.com/t5/Alteryx-Use-Cases/Data-Check-Between-Two-Sources/ta-p/270082
I'd like to create a workflow that compares all data for a given row in spreadsheet 1 with spreadsheet 2
https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Best-way-to-identify-unique-rows-of-da...
Comparing data between two different servers
https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Comparing-data-between-two-different-s...
Compare two files without a unique key - not able to Join
https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Comparing-two-files-that-do-not-have-a...
Compare field values Left vs Right
https://community.alteryx.com/t5/Inspire-2019-Buzz/Nested-Macros-Session-Field-Comparison-macro/m-p/...
Iterative Data Comparison
I am looking to compare data between 2 files - Pre and Post.
https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Iterative-Data-Comparison/td-p/841398/...
Combining 2 Data Sets and Highlighting Differences in Output
https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Combining-2-Data-Sets-and-Highlighting...
Chris