Reducing duplicated values among multiple columns

Question

Hi Alteryx Community!

I am looking at a big data set that has a structure like this:

Row_PersonPersonRow_CompanyCompanyRow_SchoolSchool1AAA1A1X1AAA2B1X1AAA3C1X1AAA4D1X1AAA5A2Z1AAA6B2Z1AAA7C2Z1AAA8D2Z2BBB1A1Y2BBB2B1Y2BBB3C1Y

This is how I am receiving the data, so I can't change the way it is being structured.

It appears that, for each school record of a person, the company records are being presented multiple times.

For example Person AAA went to two different schools and worked at four different companies. So for each school record, the company records are shown.

This is how I need this data:

Row_PersonPersonRow_CompanyCompanyRow_SchoolSchool1AAA1A1X1 2B2Z1 3C  1 4D  2BBB1A1Y2 2B  2 3C

I was thinking about finding a method to apply a multi-row formula to the company column, so that I can take away the "duplicates", but since every person might have different amounts of companies, it is not possible to find a common logic that would apply.

Any thoughts and suggestions are very welcome!

Thank you!

Ben_H · Accepted Answer

Hi @scollier1993

I've attached an example -

I transposed the data around the person ID,  then summarised it to remove duplicated values.

I then sorted the data and assigned an ascending rank to each column, that way when you cross tab it back you can create the output you desire. I then just did a little bit of cleanup on the data.

I'm not sure it's the most efficient way to do it, but it looks to work.

Regards,

Ben

EDIT* I've just noticed I changed the output order of the columns slightly! But it still does the job.

Maskell_Rascal · Accepted Answer

Hi @scollier1993

The attached should work for you. Using the transpose tool and multi-row formula tool will get most of the way there. I then split the data out to fix the duplication on the schools and combined it all back at the end.

If this solves your issue please mark the answer as correct, if not let me know!

Thanks!

Phil

szade1 · Accepted Answer

Hi @scollier1993 ,

Here's my approach:

This should work for any set of dynamic values in the Rows_School and School columns for Rows_Person.

Hope it help! 🙂

Thanks,

S.