Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Merge multiple rows based on unique key and deduplicate field values

dkazup
5 - Atom

I have a data set that has multiple rows of data and multiple fields.  Not all fields may have a value and some fields may have the same value in multiple rows.   I am trying to create a single row of related data "rollup" and then in each field concatenate if needed and deduplicate the concatenated list of value.

 

What I have been doing so far is to create a unique key based on a combination of fields. Then I test if there is actually values in those fields and if not -- I send those rows around the process... Say for example the unique key is First Name Last Name DOB.  If there is not DOB  in a row then that row goes around the process.... I dont want to have a unique key of something like johnsmith as I might have lots of Johnsmith but different people.   So for the rows that I can make a unique key, I then group by the key and concatenate the fields with a summary tool. I then add a record ID and then to deduplicate each concated list I use a select tool send the record ID and field , text to columns (rows) based on my delimiter, use the unique tool to deduplicate and then resummarize.   In some cases I may also use a filter tool to remove null values.  I then multi join back all the fields based on the record ID and rejoin to the bypass rows.... I might then create a different unique key and repeat this process until I've "vertically" consolidated as much as possible.

 

 
 

the process looks a bit like this... each row in the container represents a field.  So if I have 50 different fields, I have 50 different rows, each one doing the same thing but the field name changes.

vertConsolidate.jpg

vertConsolidate2.jpg

If I have a lot of fields its a lot of manual.... I figure there is a macro way to do this....but I'm drawing a blank on how to start.  I figure the marco would consist of the SELECT, TEXT TO COLUMN (ROWS), UNIQUE, FILTER for NOT NULL and then SUMMARIZE group by record and concate value.

 

Should be simple...but struggling with how I would send each different field into the macro and then have it all join back up again. 

For each project the fields might called something different as well and the number may vary...

 

There might also be a better way to solve this that I'm also not thinking about. I'm dealing with sensitive data so I cant share a table...but hopefully you get what I'm trying to do.

 

4 REPLIES 4
binuacs
21 - Polaris

@dkazup provide some sample input data and expected output for a better understanding of the requirement

dkazup
5 - Atom

Sure, lest say you have this input, it can be in reality any number of columns.  The unique key is created based on Field1 + Field  2 + Field 3

 

UniqueKeyField 1Field 2Field 3Field 4
az1az15
az1az16
ay1ay1 
bz2bz25
bz2bz25
bz3bz36
cy4cy4 

 

 

The records are then grouped by the unique key and the fields are concatenated with ; delimiter

 

Record IDUniqueKeyField 1Field 2Field 3Field 4
1az1a;az;z1;15;6
2ay1ay1 
3bz2b;bz;z2;25;5
4bz3bz36
5cy4cy4 

 

Now I've grouped and collapsed the data. I then take each record ID and field transpose to get something like

 

Record IDField 1
1a
1a

 

The unique  and concate those back into a single row

 

Record IDField 1
1a

 

Then I join all the fields back based on record id to get the following output

 

UniqueKeyField 1Field 2Field 3Field 4
az1az15;6
ay1ay1 
bz2bz25
bz3bz36
cy4cy4 

 

So for every field I'm doing the same operation and in my screen shot you see those tools as a row.  What I'd like to do is figure out a way I dont have to manually duplicate each tool row.... I figure there have to be a macro way so that I'm sending to the macro the record id and the field + value  .

 

Just struggling to start the macro....the macro itself would be the TEXT TO COLUMN (ROWS) the UNIQUE and the Summarization tool.    the output would go to the multi join because if I have 25 fields I want to still have 25 fields.... I just want to roll up and deduplicate all the values based on unique key groupings.

 

binuacs
21 - Polaris

@dkazup Created a workflow based on my understanding 

image.png

dkazup
5 - Atom

I'm going to accept this as I think it gets me started where I was spinning my wheels.  The output isnt exactly...but I really appreciate the assistance.

 

Labels