Hi Alteryx Community!
I am looking at a big data set that has a structure like this:
Row_Person | Person | Row_Company | Company | Row_School | School |
1 | AAA | 1 | A | 1 | X |
1 | AAA | 2 | B | 1 | X |
1 | AAA | 3 | C | 1 | X |
1 | AAA | 4 | D | 1 | X |
1 | AAA | 5 | A | 2 | Z |
1 | AAA | 6 | B | 2 | Z |
1 | AAA | 7 | C | 2 | Z |
1 | AAA | 8 | D | 2 | Z |
2 | BBB | 1 | A | 1 | Y |
2 | BBB | 2 | B | 1 | Y |
2 | BBB | 3 | C | 1 | Y |
This is how I am receiving the data, so I can't change the way it is being structured.
It appears that, for each school record of a person, the company records are being presented multiple times.
For example Person AAA went to two different schools and worked at four different companies. So for each school record, the company records are shown.
This is how I need this data:
Row_Person | Person | Row_Company | Company | Row_School | School |
1 | AAA | 1 | A | 1 | X |
1 | 2 | B | 2 | Z | |
1 | 3 | C | |||
1 | 4 | D | |||
2 | BBB | 1 | A | 1 | Y |
2 | 2 | B | |||
2 | 3 | C |
I was thinking about finding a method to apply a multi-row formula to the company column, so that I can take away the "duplicates", but since every person might have different amounts of companies, it is not possible to find a common logic that would apply.
Any thoughts and suggestions are very welcome!
Thank you!
Solved! Go to Solution.
I've attached an example -
I transposed the data around the person ID, then summarised it to remove duplicated values.
I then sorted the data and assigned an ascending rank to each column, that way when you cross tab it back you can create the output you desire. I then just did a little bit of cleanup on the data.
I'm not sure it's the most efficient way to do it, but it looks to work.
Regards,
Ben
EDIT* I've just noticed I changed the output order of the columns slightly! But it still does the job.
The attached should work for you. Using the transpose tool and multi-row formula tool will get most of the way there. I then split the data out to fix the duplication on the schools and combined it all back at the end.
If this solves your issue please mark the answer as correct, if not let me know!
Thanks!
Phil
Hi @scollier1993 ,
Here's my approach:
This should work for any set of dynamic values in the Rows_School and School columns for Rows_Person.
Hope it help! 🙂
Thanks,
S.