I have a table as below with different start and end dates but few have same name. I would like to calculate the total duration for each name. For the name A there is overlapping duration among 4 rows and for B they are discontinuous. I would like to add new field and show total duration for each name. How can I calculate for any type of durations like continuous, overlapping and discontinuous.
Name | Start Date | End Date |
A | 1996-2-26 | 2002-05-30 |
A | 2001-10-27 | 2005-12-29 |
A | 1992-12-14 | 2010-03-01 |
A | 1997-3-3 | 2018-11-28 |
D | 2006-6-26 | 2009-6-19 |
B | 1985-3-22 | 1988-5-29 |
B | 1999-9-3 | 2001-11-2 |
B | 2001-11-7 | 2003-12-31 |
B | 2003-12-31 | 2006-12-11 |
B | 2009-11-24 | 2012-7-10 |
B | 2012-6-27 | 2014-7-8 |
B | 2016-6-28 | 2018-10-3 |
B | 2001-9-29 | 2014-11-15 |
C | 2012-7-31 | 2015-12-15 |
Solved! Go to Solution.
Hi @ssphv !
I think the key here is the 'Generate Rows' tool to generate each day between the start and end dates for each name. Then, with the summarize tool, group by the 'Name' and use the Count distinct functionality to get the non overlapping days. See the workflow attached.
Nailed @CarlDi
I was developing a similar solution, learned this technique with incredible @MarqueeCrew
By the way, this simple workflow solving such a complex problem that probably would demand the user to trace multiple timelines is why I feel Alteryx's powers are unimaginable! Amazing.
Cheers,
@CarlDi Thank you for the solution, it works as expected. But my actual data is really huge and also date ranges are really large. Generate rows tool is creating huge data and my workflow is running really slow. Is there any other suggestion you can give so that performance can be improved for huge data.
Thank you in advance!
Hi @ssphv
Different users, different algorithms. The ones from @Thableaus and @CarlDi use generate rows to build ranges and then count the days in them. Excellent choices and the one I would have gone with as well, had I not read your comment about the memory issues.
This workflow uses three multi row tools to calculate the running end date, running start date and number of continuous ranges in each name. it then finds the largest interval in each range and sums those up for the total.
It does this while working with the original dataset without adding any extra rows, thereby minimizing memory use.
This window shows the intermediate results before any summarizing, showing the details of the continuous ranges and their cumulative start and end dates
Dan
It's so great to get new ideas from you, @danilang
These ranges work like the timelines and you were able to bring this timeline sketch to Alteryx logic in a simple way.
Cheers,
Hey @Thableaus
Stop it. You're making me blush! Besides, I gotta spread the word! And that word is Alteryx!
Any chance you're going to Inspire? Maybe we can meet up for a beer.
Dan
Yes I am! Yeah, I'm looking forward to meet all of the legends in this Community. These are one of the reasons I'm so excited to go to Inspire.
We will for sure have a beer and celebrate the success of the Community engagement.
@Thableaus @danilang thanks for the input!
@danilang Thank you for the solution! This works perfect and runs pretty fast even for huge data. Thank you!