I have a data set that has multiple rows of data and multiple fields. Not all fields may have a value and some fields may have the same value in multiple rows. I am trying to create a single row of related data "rollup" and then in each field concatenate if needed and deduplicate the concatenated list of value.
What I have been doing so far is to create a unique key based on a combination of fields. Then I test if there is actually values in those fields and if not -- I send those rows around the process... Say for example the unique key is First Name Last Name DOB. If there is not DOB in a row then that row goes around the process.... I dont want to have a unique key of something like johnsmith as I might have lots of Johnsmith but different people. So for the rows that I can make a unique key, I then group by the key and concatenate the fields with a summary tool. I then add a record ID and then to deduplicate each concated list I use a select tool send the record ID and field , text to columns (rows) based on my delimiter, use the unique tool to deduplicate and then resummarize. In some cases I may also use a filter tool to remove null values. I then multi join back all the fields based on the record ID and rejoin to the bypass rows.... I might then create a different unique key and repeat this process until I've "vertically" consolidated as much as possible.
the process looks a bit like this... each row in the container represents a field. So if I have 50 different fields, I have 50 different rows, each one doing the same thing but the field name changes.
If I have a lot of fields its a lot of manual.... I figure there is a macro way to do this....but I'm drawing a blank on how to start. I figure the marco would consist of the SELECT, TEXT TO COLUMN (ROWS), UNIQUE, FILTER for NOT NULL and then SUMMARIZE group by record and concate value.
Should be simple...but struggling with how I would send each different field into the macro and then have it all join back up again.
For each project the fields might called something different as well and the number may vary...
There might also be a better way to solve this that I'm also not thinking about. I'm dealing with sensitive data so I cant share a table...but hopefully you get what I'm trying to do.
Solved! Go to Solution.
@dkazup provide some sample input data and expected output for a better understanding of the requirement
Sure, lest say you have this input, it can be in reality any number of columns. The unique key is created based on Field1 + Field 2 + Field 3
UniqueKey | Field 1 | Field 2 | Field 3 | Field 4 |
az1 | a | z | 1 | 5 |
az1 | a | z | 1 | 6 |
ay1 | a | y | 1 | |
bz2 | b | z | 2 | 5 |
bz2 | b | z | 2 | 5 |
bz3 | b | z | 3 | 6 |
cy4 | c | y | 4 |
The records are then grouped by the unique key and the fields are concatenated with ; delimiter
Record ID | UniqueKey | Field 1 | Field 2 | Field 3 | Field 4 |
1 | az1 | a;a | z;z | 1;1 | 5;6 |
2 | ay1 | a | y | 1 | |
3 | bz2 | b;b | z;z | 2;2 | 5;5 |
4 | bz3 | b | z | 3 | 6 |
5 | cy4 | c | y | 4 |
Now I've grouped and collapsed the data. I then take each record ID and field transpose to get something like
Record ID | Field 1 |
1 | a |
1 | a |
The unique and concate those back into a single row
Record ID | Field 1 |
1 | a |
Then I join all the fields back based on record id to get the following output
UniqueKey | Field 1 | Field 2 | Field 3 | Field 4 |
az1 | a | z | 1 | 5;6 |
ay1 | a | y | 1 | |
bz2 | b | z | 2 | 5 |
bz3 | b | z | 3 | 6 |
cy4 | c | y | 4 |
So for every field I'm doing the same operation and in my screen shot you see those tools as a row. What I'd like to do is figure out a way I dont have to manually duplicate each tool row.... I figure there have to be a macro way so that I'm sending to the macro the record id and the field + value .
Just struggling to start the macro....the macro itself would be the TEXT TO COLUMN (ROWS) the UNIQUE and the Summarization tool. the output would go to the multi join because if I have 25 fields I want to still have 25 fields.... I just want to roll up and deduplicate all the values based on unique key groupings.
@dkazup Created a workflow based on my understanding
I'm going to accept this as I think it gets me started where I was spinning my wheels. The output isnt exactly...but I really appreciate the assistance.