Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Grouping into time buckets?

summarizer
9 - Comet

 

I need to review thousands of the projects and start times on the left (A and B) and assign them to the time frame where they fell during that day (D,E,F).  These values are fluid...they change daily. These data points are NOT in the same page - I just made this photo below for easy reference.

 

I did manage to work this with an APPEND tool.  However, I'm just doing 2 days of data over several employees and I have 153 million rows (which I'll bring back to normal with the following filter tool), but I need to add much more data.  Is there any other way?  

 

Thanks in advance!

 

JOIN QUESTION.jpg

13 REPLIES 13
soccertil1108
8 - Asteroid

Append is typically the methodology for ranging data; you expand the data set as a product join x * y rows and then reject the out of range records, keeping the in range records. An alternative comes to mind where by you would employ a macro that executes for each range. That only becomes practical if the time range data is fairly descrete in size. For clarification, the data set on the left [AB] is thousands of records and the data set on the right [DEF] is limited [in your example 10 records]? If that is the case then I can benchmark the performance of an APPEND and FILTER workflow versus a Macro workflow with FILTER to see if there is a significant performance difference. Let me know. I look forward to the challenge.

summarizer
9 - Comet

Thanks.  I'm up for anything.  Macros are not my strong suit though.  My current workflow is appending 40,000 records into 3,800 potential buckets and after I filter on the 153 million rows, I'm getting back 200 less records than went in, so some things are not bucketing appropriately in there.

 

 

soccertil1108
8 - Asteroid

that is a very large set of records for the Source side of the append. By any chance are the potential buckets date dependent? Meaning, they don't cross midnight [Nov 10, 2020 11:01 pm to Nov 11, 2020 1:00 am]? Just trying to free-think how to reduce processing costs and build some macros to implement that.

 

btw. Macros are a get extension to Alteryx the can get you out of some complex processing easily. check out Alteryx's Macro course and my YouTube channel video: Understanding and Using Macros when you have a chance.

https://youtu.be/-2h28xaRIcg

 

 

summarizer
9 - Comet

I can try.  I use macros rarely so I don't retain the info.  The dates do not go over midnight.  But one other issue I'm having is that the buckets are employee specific.  I have o ensure that Employee 1's project's don't accidentally get put in the bucket of Employee 2.  So, I am trying to match against the employee, the date and then determine which bucket their project time fell into.

soccertil1108
8 - Asteroid

good clarification. that'll help with a solution. give me a while to test out some thoughts.

soccertil1108
8 - Asteroid

can you provide a sample of the date buckets? with employee and range. you can redact the employee inform except for an ID.

summarizer
9 - Comet

I am just providing some dummy data.  Similar to the above, but I just put them into two different sets (one below the other) to be clearer.

JOIN QUESTION 2.jpg

 

Now, let me also mention another piece.  I do currently have these sets unioned together by the start times so I can get a holistic view of what the employee did each day.  So I know that Segment Fourth (for example) starts and then there might be dozens of projects that happen below it.  If I could do a fill down, I could fill the Work Type down the page until the next Work Type starts, but I too have tried various ways of doing that (with the Multi-Row tool) and have not been successful.  Also, I am tired and it's my day off so thank you for any assistance, but please don't work too hard :-)  I will respond when I log back in. 

soccertil1108
8 - Asteroid
Attached is a solution. While I was intrigued by your thought of using MultiRow and a related thought of creating a network hierachy, the problem you have is really the classic non-equi join. In the database environment it is resolved by first creating a product join and then keeping records that match the range condition. Peformance/resources become the limiting factor with large data sets; both within Alteryx and within an RDBMS. The solution lies in breaking down the process into manageable subsets. This is is where Alteryx macros provide an efficient alternative while keeping within the Append Fields solution which directly addresses non-equi joins. My test data contains consistent records for 100 employees across 50 days with 10 activities per day [50,000] and 100 employees across 50 days with 5 timestamped ranges per day [25,000]. I manually modified 2 records to be out of range, one before the first range and one after the last range. In my solution I processed subsets by employee. The full Append Fields took approximately 2 minutes to execute on my laptop and this included no range filtering. The full non-equi join workflow with macro took approximately 3 seconds to execute. Please feel free to reach out with any questions. Hope this helps.
soccertil1108
8 - Asteroid

had an issue with posting the solution so I've included it here.

Labels