Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Is there an easier way to iteratively remove rows and re-query the same data set?

jdsimcock
6 - Meteoroid

Apologies if this is a little convoluted, I am optimistic that there is a simpler way I just don't know what it is, hence the request for help...

 

I have created a workflow and related batch macro to firstly identify the first occurrence of a duplicate in a data set and then remove both the related rows (pair).  I then re-query the same (now updated) data set for the possible first occurrence of a duplicate (looking for the next pair), and so on until all the pairs have been removed.  I am doing it this way as one of the pair is a negative amount whereas the other is a positive amount and so they cancel each other out and some can be removed.  It is possible for more than one positive match for each negative but obviously I can't remove all matches because that would not sum to zero so have to remove in pairs and then re-query.

 

In order to achieve the above, I have found myself (relatively new to Alteryx) having to write/append the pairs found to a external file, which I then read on the next iteration for items to ignore, so gradually as I pass through each iteration the items to ignore gets longer until no more matches are found.  This iterative process may be inevitable, maybe not, but the thing that troubles more most of all is that writing/appending to a temp file is very slow and I am almost sure that there is a better, faster way to do this, hence the call for help comments.

 

Remove Contra EntriesRemove Contra EntriesBatch MacroBatch Macro

12 REPLIES 12
danilang
19 - Altair
19 - Altair

Hi @jdsimcock

 

Can you share some sample data for this?  Approx 10-20 rows of input and what criteria you're using to determine if rows are duplicate?

 

Dan

jdsimcock
6 - Meteoroid

Hi @danilang

 

Thank you for your interest and quick response.

 

Please find attached a data file "Mock_Data.xlsx".  This data would be input into the 'Remove Contra Entries' workflow.  The join is on 'Key_Data' and 'Absolute_Amount'.  Key_Data is a concatenation of the following fields;[WBS Element]+[Bldg Description]+[WBS Description]+[Project Name].  Absolute_Amount is the absolute amount for the [Amount] field. The Key_Data and Absolute_Amount fields are added in the first formula in the 'Remove Contra Entries' workflow.

 

Hope that is clear enough.  Look forward to your feedback.

 

Jonathan

jdsimcock
6 - Meteoroid

Hi,

 

I am attaching an abridged version of the workflow but the same as the one (part) shown in the image I shared earlier, hopefully with no identifiable data still in there :-).  Fingers crossed that having this will help to find a more elegant solution.

 

Jonathan.

DavidxL
9 - Comet

This was an interesting problem to tackle. I've attached a workflow which hopefully gives you a good starting point to build off of. 

 

The general approach was:

1. Add a RecordId to keep an id for each row ("transaction")

2. Use the Tile tool to make an id for each group (key_data)

3. Join key_data on itself to essentially do a cartesian join of all transactions per group. *Note if your data starts getting large this may cause issues and you might have to use a different approach.

4. Do some filtering to narrow down to unique pairs (joins) of transactions

5. Add the amounts of transactions and filter those that sum to zero

6. Pick the first matched pair for a given transaction

7. Join these zero-sum transactions and take the R output of the join to result in transactions that don't sum to zero

jdsimcock
6 - Meteoroid

Hi @DavidxL

 

Very impressive.  I will see how this performs with my full dataset.  My vocabulary of Alteryx Tools now includes the Tile tool :-).

 

Many thanks.

 

Jonathan

Claje
14 - Magnetar

Hi,

Your attached workflow did not have the macro packaged with it, so I couldn't validate my output.


I played around with an Iterative Macro that I believe accomplishes what you are looking for in a fairly performant way.

It does this by taking the first postive and the first negative record for each of your matches using the following criteria:

WBS Element
Bldg Description
WBS Description
Project Name
Absolute Amount

 

Then, it joins between positive and negative on these criteria to remove exact matches, and iterates until no records are removed.


It is currently configured to run a maximum of 10,000 times, but you can adjust this number if needed.

 

Note that this example is pretty firmly tied to your use case, and so field names are used and there is some mandatory prep before the iterative macro runs which is in the workflow.

 

Hope this helps!


Let me know if you have any questions.

jdsimcock
6 - Meteoroid

Hi @Claje,

 

Many thanks.  Wow I was not expecting such quick responses and certainly not by more than one person :-).  Feeling very lucky.  I will review with my full data and let you know how this gets on.  Thanks again.

 

Jonathan.

jdunkerley79
ACE Emeritus
ACE Emeritus

Adjusting @Claje approach.

 

I think the attached should do what you need without any iterative macro.

 

2018-11-19_16-16-17.png

 

It makes the assumption that Amount becomes and integer is times by 100.

Claje
14 - Magnetar

Pretty sure @jdunkerley79's solution is the most performant of the three options, and has the added benefit of no macro.

Still recommend testing against your dataset, but I think this is the winner.

Labels