Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Retain set of data (records of rows & columns above) to observations where [null] exists

alexankyr
6 - Meteoroid

Hi

 

I have two datasets :

 

RecordIDDate
101/08/2019
202/08/2019
303/08/2019
404/08/2019
505/08/2019

 

and 

 

DateColumn1Column2Column3
01/08/2019AE123
01/08/2019BF456
01/08/2019CG785
01/08/2019DH7546
02/08/2019AX1321
02/08/2019BY12302
02/08/2019CZ1322
02/08/2019DAA15152
05/08/2019AFF1213
05/08/2019BSS154
05/08/2019CGG2189
05/08/2019DRW5968

 

 

When I try to join them I get the following:

 

RecordIDDateColumn1Column2Column3
101/08/2019AE123
101/08/2019BF456
101/08/2019CG785
101/08/2019DH7546
202/08/2019AX1321
202/08/2019BY12302
202/08/2019CZ1322
202/08/2019DAA15152
303/08/2019[Null][Null][Null]
404/08/2019[Null][Null][Null]
505/08/2019AFF1213
505/08/2019BSS154
505/08/2019CGG2189
505/08/2019DRW5968

 

What I try to achieve is to retain the values of the last date in which data existed (02/08/2019) into the dates in which data do not exist (3/8/2019 and 4/8/2019).

 

My output should look like:

 

RecordIDDateColumn1Column2Column3
101/08/2019AE123
101/08/2019BF456
101/08/2019CG785
101/08/2019DH7546
202/08/2019AX1321
202/08/2019BY12302
202/08/2019CZ1322
202/08/2019DAA15152
303/08/2019AX1321
303/08/2019BY12302
303/08/2019CZ1322
303/08/2019DAA15152
404/08/2019AX1321
404/08/2019BY12302
404/08/2019CZ1322
404/08/2019DAA15152
505/08/2019AFF1213
505/08/2019BSS154
505/08/2019CGG2189
505/08/2019DRW5968

 

Please note that it is not just a single row that I want to retain, but a set of rows instead. Also, even if I have 4 days without data, the script should retain the data of the last available date (5 days earlier) to all 4 days.

 

How could someone do that? It looks like a dynamic join (??) is needed based on a condition (Column1 = [null] )

 

Many thanks in advance

6 REPLIES 6
wdavis
Alteryx
Alteryx

Hi @alexankyr 

 

How many missing dates do you often have?

 

In the attached workflow i have used a formula tool to create a new date field from the Left Output Anchor of the Join (the dates which don't have a corresponding ID)

 

This date field is taking the previous days date, which you can then use to join to the original data set.

 

Then once this process has been completed you can union the workflows back together.

 

Would this work for you?

 

Regards

Will

alexankyr
6 - Meteoroid

Hi

 

Thanks for the tip. I was thinking about a structure like this, but I quickly abandoned it because it is not quite dynamic and it works for a set number of "missing dates" (2 in your example).

 

For example, in August 2019 I had 10 missing dates, so If i employed this structure I should do 10 sequential joins.

 

In theory I can have even more than 10 missing dates if we fall into a bank holiday period (Easter or Christmas) or if some external statements (datasets) are not available.

 

Ideally, the number of sequential joins in the workflow should be dynamic to the number of missing dates. That's why I referred to a dynamic join before until I end up to a complete dataset.

 

thanks

danilang
19 - Altair
19 - Altair

Hi @alexankyr 

 

For a dynamic solution that will handle any number of missing dates, try this solution

 

WF.png

 

Start with a unique tool to pull out the list of possible Column1 values.  Append these to the date table to get one row for each date/column1 combination.  Join this to the original transactions and add a Union tool to include the non-matched rows.  Sort this list by Column1 and then Date.  Use a couple of Multirow tools to fill in the missing values for Column2 and Column3.  The final sort brings back the original transaction order, resulting in the following.

 

Results.png

 

This solution uses 2 Multirow tools since you only have 2 columns to fill.  If your actual data set has more columns, or a varying number of columns, transpose these columns so that you're only working with one multirow tool and then cross tab to get the original column structure back.

 

Dan

alexankyr
6 - Meteoroid

Hi @danilang 

 

Your workflow worked like a charm in my dataset.

 

Luckily, I only need 2 columns to fill, but my dataset consists of more columns that I do not need hence I dropped them.

 

It would be great to see how the workflow would be if I had a varying number of columns, just for future reference.

 

Many thanks!

 

 

estherb47
15 - Aurora
15 - Aurora

Hi @alexankyr 

Here's another approach that's very dynamic. It relies on the number of unique rows per Date (in your case, 4), and uses Generate Rows a few times. Once to generate the number of rows for each unmatched date. Another time to generate rows from the dates that are missing from your data.

When a date is imputed into the data set, it relates back to the rows from which to pull the data (Record ID and Match ID in my workflow), so you'll have a row to row match. This would be instead of a few multi row, or transpose-multi-row as proposed in @danilang 's excellent solution.

The filled in data is unioned with the original data, and all is sorted.

A bit messy because of crossing connections, but gets the job done. 

image.png

Let me know if this works.

 

Cheers!

Esther

estherb47
15 - Aurora
15 - Aurora

@alexankyr 

Try out my solution with a different number of columns and see if it works for you. It should, at least for the way I've tested it (though you'll have to clean up the last join tool to get rid of extra columns on the Right side)

 

Cheers

Esther

Esther

Labels