Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Filtering new rows

essemMLB
7 - Meteor

Hello Alteryx community,

 

I hope everyone is doing as well as you can be during this pandemic. I have what I hope to be a relatively simple question. Here's what I'm trying to do:

 

I have a file with three columns: the first contains a unique ID, the second contains a string that categorizes the row, and the third contains a string relating to an open-text question response. For example:

 

Response IDCategoryText
A1B1SmallI loved it!
A1B2SmallIt was OK!
A1B3GiantIt was the best!

 

New rows are downloaded every morning, and there will eventually be hundreds of thousands of them, so manual coding doesn't work here.

 

What I need to do is filter the total dataset so that, at the end of the workflow, I have a file that contains only the rows that have been added since the last time I ran the workflow

 

For example, let's say I ran the workflow on July 1 at 12:00 PM and collected the three rows above. Then, I run the workflow on July 2 at 12:00 PM and I see the following:

 

Response IDCategoryText
A1B1SmallI loved it!
A1B2SmallIt was OK!
A1B3GiantIt was the best!
A1B4ModerateThe Alteryx community is the best!
A1B5GiantNo really, they're the best!

 

I would want to filter those two "new" rows into a separate table, so that they can be outputted to a file. 

 

Alternatively, I'd like to create a column that assigns a number to the row based when it was added to the master dataset. For example, using the second table in this thread as a starting point, let's say:

 

Response IDCategoryTextRun
A1B1SmallI loved it!1
A1B2SmallIt was OK!1
A1B3GiantIt was the best!1
A1B4ModerateThe Alteryx community is the best!2
A1B5GiantNo really, they're the best!3

 

In this case, I'd know the last "Run" that had been filtered out and I would be able to say: "OK, filter all rows with 2 or 3 in the Run column and output them to a separate file."

 

I'm open to other approaches as well, but the final goal is to have a file that contains ONLY newly added rows. 

3 REPLIES 3
grossal
15 - Aurora
15 - Aurora

Hi @essemMLB,

 

your text examples are lovely! Here is how I'd attempt to solve this problem:

 

 

AlteryxGui_ZWTwh0rtS8.png

I'd split it up in three files:

- one file provides all the input ('New File')

- one saves all data ('Stored files')

- one saves news lines ('New Lines output')

 

The workflows joins the data together in two ways:

- combined stored + new rows to save the for the next run (including run ID)

- only new lines

 

You are probably able to use the multi-row formula for your run ID.

 

 

I'll attach the rough sample workflow. Let me know if it helps.

 

Best

Alex

AbhilashR
15 - Aurora
15 - Aurora

@essemMLB - how about including a new timestamp column to your existing table? that way you know when the data was loaded, and can filter out records based on max dates? 

essemMLB
7 - Meteor

This worked like a charm! Thank you!

 

(I hope you are safe and healthy!)

Labels