Hi,
I couldn't find the exact solution to my problem, so creating a new topic.
I have the following dataset:
Customer ID | Product ID | Start Date | End Date | Value |
111 | 111 | 20200131 | 20201231 | 1 |
111 | 222 | 20200131 | 20201231 | 2 |
111 | 333 | 20200131 | 20201231 | 3 |
222 | 111 | 20200131 | 20201231 | 4 |
222 | 222 | 20200131 | 20201231 | 5 |
222 | 222 | 20200401 | 20200415 | 6 |
222 | 333 | 20200131 | 20201231 | 7 |
333 | 111 | 20200131 | 20201231 | 8 |
333 | 111 | 20210131 | 20211231 | 9 |
I need to create a workflow which adds a column at the end and flags the records that overlap with other ones. So in the example above records with values 5 and 6 should be flagged.
FYI...The real dataset contains around 100k records.
Thanks,
Pedram
Hi @peddy
Have a doubt why are last 2 rows not colored. Same Customer and product ID but different time.
Can you be clear with the logic.
333 | 111 | 20200131 | 20201231 | 8 |
333 | 111 | 20210131 | 20211231 | 9 |
because one is 2020 and the other is 2021
Thanks - However, your solutions doesn't fully work. As you can see below line 2, 10 and 15 should have been true. I'm assuming the formula in the second multi-formula tool needs to be changed, but not 100% sure to what exactly.
I did try a slightly different approach by summarizing on customer and product to get the max end date which seemed to resolve one of false negatives, but that's as far as I managed to get...
First things to check, make sure Group By is selected in Multi Row formula tools
Can you post a little more sample data and your workflow then will check through?
There's a few attempts in my workflow 🙂
I should mention that the end state needs to be that the dates are changed so there are no overlaps
so for example, if we have the following overlap
20200101 | 20201231 | 8.37 |
20200301 | 20200331 | 9.21 |
20200401 | 20200430 | 11.45 |
The data set needs to change to...
20200101 | 20200229 | 8.37 |
20200301 | 20200331 | 9.21 |
20200401 | 20200430 | 11.45 |
20200501 | 20201231 | 8.37 |
So basically throughout 2020 the price is 8.37 expect in March and April where it's 9.21 and 11.45 respectively.
OK thats a little fiddlier - specifically how to split record into parts. WIll have a play and come back to you
Ok, have built a workflow but this one is complicated...
Going to go through it in bits and then you can see if it works on your full set.
So start by ordering the records Customer, Product, Start
Add a grouped record ID within each Customer, Product (using a Multi Row formula)
Next transpose so Start and End dates are on separate rows
Now sort into Customer, Product, Date (either start or end)
The next tool is where it gets complicated.
Using a multi-row formula you track which rows are still active. This involves keeping a string and adding a new entry at the end or removing one.
After this use a sample tool to eliminate any days where multiple events occur
Finally, a formula tool is used to get the last active period and add CustomerID and ProductID to make a key
Now, back to more normal processes...
Use a find and replace to get the value for the current record
A little more working out to put together a start and end date and then some filtering.
I think this is close to what you need, though I admit it is pretty complicated.
User | Count |
---|---|
17 | |
15 | |
15 | |
8 | |
6 |