Alteryx Designer Desktop Discussions

alexankyr · ‎08-30-2019

Hi

I have two datasets :

RecordID	Date
1	01/08/2019
2	02/08/2019
3	03/08/2019
4	04/08/2019
5	05/08/2019

and

Date	Column1	Column2	Column3
01/08/2019	A	E	123
01/08/2019	B	F	456
01/08/2019	C	G	785
01/08/2019	D	H	7546
02/08/2019	A	X	1321
02/08/2019	B	Y	12302
02/08/2019	C	Z	1322
02/08/2019	D	AA	15152
05/08/2019	A	FF	1213
05/08/2019	B	SS	154
05/08/2019	C	GG	2189
05/08/2019	D	RW	5968

When I try to join them I get the following:

RecordID	Date	Column1	Column2	Column3
1	01/08/2019	A	E	123
1	01/08/2019	B	F	456
1	01/08/2019	C	G	785
1	01/08/2019	D	H	7546
2	02/08/2019	A	X	1321
2	02/08/2019	B	Y	12302
2	02/08/2019	C	Z	1322
2	02/08/2019	D	AA	15152
3	03/08/2019	[Null]	[Null]	[Null]
4	04/08/2019	[Null]	[Null]	[Null]
5	05/08/2019	A	FF	1213
5	05/08/2019	B	SS	154
5	05/08/2019	C	GG	2189
5	05/08/2019	D	RW	5968

What I try to achieve is to retain the values of the last date in which data existed (02/08/2019) into the dates in which data do not exist (3/8/2019 and 4/8/2019).

My output should look like:

RecordID	Date	Column1	Column2	Column3
1	01/08/2019	A	E	123
1	01/08/2019	B	F	456
1	01/08/2019	C	G	785
1	01/08/2019	D	H	7546
2	02/08/2019	A	X	1321
2	02/08/2019	B	Y	12302
2	02/08/2019	C	Z	1322
2	02/08/2019	D	AA	15152
3	03/08/2019	A	X	1321
3	03/08/2019	B	Y	12302
3	03/08/2019	C	Z	1322
3	03/08/2019	D	AA	15152
4	04/08/2019	A	X	1321
4	04/08/2019	B	Y	12302
4	04/08/2019	C	Z	1322
4	04/08/2019	D	AA	15152
5	05/08/2019	A	FF	1213
5	05/08/2019	B	SS	154
5	05/08/2019	C	GG	2189
5	05/08/2019	D	RW	5968

Please note that it is not just a single row that I want to retain, but a set of rows instead. Also, even if I have 4 days without data, the script should retain the data of the last available date (5 days earlier) to all 4 days.

How could someone do that? It looks like a dynamic join (??) is needed based on a condition (Column1 = [null] )

Many thanks in advance

wdavis · ‎08-30-2019

Hi @alexankyr

How many missing dates do you often have?

In the attached workflow i have used a formula tool to create a new date field from the Left Output Anchor of the Join (the dates which don't have a corresponding ID)

This date field is taking the previous days date, which you can then use to join to the original data set.

Then once this process has been completed you can union the workflows back together.

Would this work for you?

Regards

Will

alexankyr · ‎08-30-2019

Hi

Thanks for the tip. I was thinking about a structure like this, but I quickly abandoned it because it is not quite dynamic and it works for a set number of "missing dates" (2 in your example).

For example, in August 2019 I had 10 missing dates, so If i employed this structure I should do 10 sequential joins.

In theory I can have even more than 10 missing dates if we fall into a bank holiday period (Easter or Christmas) or if some external statements (datasets) are not available.

Ideally, the number of sequential joins in the workflow should be dynamic to the number of missing dates. That's why I referred to a dynamic join before until I end up to a complete dataset.

thanks

danilang · ‎09-03-2019

Hi @alexankyr

For a dynamic solution that will handle any number of missing dates, try this solution

Start with a unique tool to pull out the list of possible Column1 values. Append these to the date table to get one row for each date/column1 combination. Join this to the original transactions and add a Union tool to include the non-matched rows. Sort this list by Column1 and then Date. Use a couple of Multirow tools to fill in the missing values for Column2 and Column3. The final sort brings back the original transaction order, resulting in the following.

This solution uses 2 Multirow tools since you only have 2 columns to fill. If your actual data set has more columns, or a varying number of columns, transpose these columns so that you're only working with one multirow tool and then cross tab to get the original column structure back.

Dan

alexankyr · ‎09-03-2019

Hi @danilang

Your workflow worked like a charm in my dataset.

Luckily, I only need 2 columns to fill, but my dataset consists of more columns that I do not need hence I dropped them.

It would be great to see how the workflow would be if I had a varying number of columns, just for future reference.

Many thanks!

estherb47 · ‎09-03-2019

Hi @alexankyr

Here's another approach that's very dynamic. It relies on the number of unique rows per Date (in your case, 4), and uses Generate Rows a few times. Once to generate the number of rows for each unmatched date. Another time to generate rows from the dates that are missing from your data.

When a date is imputed into the data set, it relates back to the rows from which to pull the data (Record ID and Match ID in my workflow), so you'll have a row to row match. This would be instead of a few multi row, or transpose-multi-row as proposed in @danilang 's excellent solution.

The filled in data is unioned with the original data, and all is sorted.

A bit messy because of crossing connections, but gets the job done.

Let me know if this works.

Cheers!

Esther

estherb47 · ‎09-03-2019

@alexankyr

Try out my solution with a different number of columns and see if it works for you. It should, at least for the way I've tested it (though you'll have to clean up the last join tool to get rid of extra columns on the Right side)

Cheers

Esther

Alteryx Designer Desktop Discussions

Retain set of data (records of rows & columns above) to observations where [null] exists