Find duplicates in different rows and different columns
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I have 2 tables as below and below is my requirement:
1. Find if there are duplicates for any column in any row. For example, row1 in table1 and row1 in Table2 have ‘2’ in common. Pull such records and generate a record combining both and removing duplicate like row1 in Table3(Final output)
2. One condition is when combining such records if total values exceed 20, split into 2 records. Like Row4 in Table1 and Row4 in Table2 together after removing duplicates have 22 values, from Table3 you can see that it is split into 2 rows.
Table1:
A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T |
2 | 3 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 | 11 | 56 | 48 | 22 | 38 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
81 | 54 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 | 9 | 8 | 7 | 6 | 5 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 23 | 25 |
|
|
Table2:
A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T |
77 | 2 | 78 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
60 | 61 | 62 | 63 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
88 | 89 | 54 | 86 | 90 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 | 30 | 31 | 10 | 9 | 8 | 45 |
|
|
|
|
|
|
|
|
|
|
|
|
|
Final Output(Table3):
A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T |
2 | 3 | 77 | 78 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 | 11 | 56 | 48 | 22 | 38 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
81 | 54 | 88 | 89 | 86 | 90 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
60 | 61 | 62 | 63 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 | 9 | 8 | 7 | 6 | 5 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 23 | 25 | 29 | 30 |
31 | 45 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
How can I achieve this. Thanks in advance!
Solved! Go to Solution.
- Labels:
- Parse
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Try this approach:
Use a TRANSPOSE tool to group by ROW (RECORDID will help). This will give you RecordID plus name value pairs. Filter out all blank values. Repeat for table 2. Union the data together and then CROSSTAB the data back to a single row by GROUPING on Record ID, using Name as Header and Value as Value. In the crosstab, you can select FIRST as the method. This will remove duplicates. If it doesn't, then use a unique tool (all 3 fields checked) and try it using concatenate.
This should get you past your basic problem. Next you'll have to deal with limits of 20 fields.
Cheers,
Mark
Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @ssphv
Not the easiest solution here, but it might work.
Check the workflow details and let me know if it suits you. Hard to explain, but it was quite a challenge.
Cheers,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @ssphv
Wow!! What a challenge this one was. I wanted to build something that would match any value in any row or column, as per your request, which took a few transpose, joins, cross tabs, and unions. May be more convoluted than it needs to be. But since Row 5 could match to row 1, transposing and cross-tabbing back together wouldn't work.
Figuring out the "over 20 columns" was fun too. Decided to reassign numbers to the headers, and anything over 20 had the numbers restart at 1. Those records also got a letter appended to their record ID so that the data would build back in the proper order.
Please let me know if any of this doesn't make sense, and if it works for you.
cheers!
Esther
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Just tried it with different sets of numbers, and it still works :)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
This works perfectly and yes I forgot to mention that it should match any value in any row and any column. I appreciate your help!! Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@estherb47 Thanks for providing the solution. Today I noticed a new case which is not being handled by this workflow. I updated my input data.
Table 1 : 5th row
Table 2: 5th and 6th rows.
5th row in table1 matches with both 5 and 6 in table2. In this case the result is not expected and I couldn't figure out why it is aggregating values. Any help will be appreciated.
Thank you in advance!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @ssphv ,
I'll take a look at this today. Apologies for the lengthy delay - was travelling last week.
Cheers!
Esther
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
No problem. Thank you in advance!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thought about this some more. I think to get the results of matching to multiple rows, not just one row, you'd need a batch macro that compares one row of the bottom input against the whole top data set, then moves down to the next row and repeats the process. After the macro runs, you could remove the duplicate rows.
I think you can just turn my original flow into this batch macro, and have the second input feed the Control parameter. Let me know if you need more help with implementing (and then please let me know which version of Alteryx you're running so I can be sure the macro will work for you)
Cheers!
Esther
