Reducing duplicated values among multiple columns
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi Alteryx Community!
I am looking at a big data set that has a structure like this:
Row_Person | Person | Row_Company | Company | Row_School | School |
1 | AAA | 1 | A | 1 | X |
1 | AAA | 2 | B | 1 | X |
1 | AAA | 3 | C | 1 | X |
1 | AAA | 4 | D | 1 | X |
1 | AAA | 5 | A | 2 | Z |
1 | AAA | 6 | B | 2 | Z |
1 | AAA | 7 | C | 2 | Z |
1 | AAA | 8 | D | 2 | Z |
2 | BBB | 1 | A | 1 | Y |
2 | BBB | 2 | B | 1 | Y |
2 | BBB | 3 | C | 1 | Y |
This is how I am receiving the data, so I can't change the way it is being structured.
It appears that, for each school record of a person, the company records are being presented multiple times.
For example Person AAA went to two different schools and worked at four different companies. So for each school record, the company records are shown.
This is how I need this data:
Row_Person | Person | Row_Company | Company | Row_School | School |
1 | AAA | 1 | A | 1 | X |
1 | 2 | B | 2 | Z | |
1 | 3 | C | |||
1 | 4 | D | |||
2 | BBB | 1 | A | 1 | Y |
2 | 2 | B | |||
2 | 3 | C |
I was thinking about finding a method to apply a multi-row formula to the company column, so that I can take away the "duplicates", but since every person might have different amounts of companies, it is not possible to find a common logic that would apply.
Any thoughts and suggestions are very welcome!
Thank you!
Solved! Go to Solution.
- Labels:
- Datasets
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I've attached an example -
I transposed the data around the person ID, then summarised it to remove duplicated values.
I then sorted the data and assigned an ascending rank to each column, that way when you cross tab it back you can create the output you desire. I then just did a little bit of cleanup on the data.
I'm not sure it's the most efficient way to do it, but it looks to work.
Regards,
Ben
EDIT* I've just noticed I changed the output order of the columns slightly! But it still does the job.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
The attached should work for you. Using the transpose tool and multi-row formula tool will get most of the way there. I then split the data out to fix the duplication on the schools and combined it all back at the end.
If this solves your issue please mark the answer as correct, if not let me know!
Thanks!
Phil
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @scollier1993 ,
Here's my approach:
This should work for any set of dynamic values in the Rows_School and School columns for Rows_Person.
Hope it help! 🙂
Thanks,
S.
