A dataset that I find myself working with provides an aggregated number of times an occurance happens in a given month - say for example the number of times the 8AM morning commuter busses were late. The data may be formatted like the following:
Bus # | January | February | March | April | May | June |
13N | 5 | 2 | 5 | 3 | 2 | 0 |
19W | 2 | 4 | 2 | 4 | 1 | 4 |
22E | 2 | 3 | 1 | 2 | 2 | 3 |
27W | 6 | 3 | 5 | 8 | 7 | 3 |
34S | 1 | 2 | 1 | 3 | 1 | 2 |
I need to convert the number of times per month specified into records where each bus number will have unique dates for the following year (a pseudo-forecast of what may happen next year). Since a bus would only have one 8AM start on a given day, date uniqueness is essential. The output desired is similar to the following:
Bus # | Date |
13N | 2020-01-04 |
13N | 2020-01-06 |
13N | 2020-01-24 |
13N | 2020-01-27 |
13N | 2020-01-30 |
13N | 2020-02-05 |
13N | 2020-02-29 |
19W | 2020-01-06 |
19W | 2020-01-13 |
Since some busses may be late every day, I need to ensure that every calendar day within a month is available to choose from (not just stopping at 28 to account for non-leap years). 2020 is a leap year, so 2020-02-29 would be in play.
Has anyone done something like this before? If so/not, how would one go about doing this in Alteryx?
Thanks in advance for your suggestion(s).
Solved! Go to Solution.
Hi @A1 I mocked up a workflow that generates a row for every date and then appends the data to your dataset. Let me know what you think?
Thanks for looking at this. Perhaps I need something to come in between your process to be able to use it; though likely I did not explain well enough what I'm wanting Alteryx to do.
In the initial table I have
Bus # | Jan | Feb | Mar | Apr | May | Jun |
13N | 5 | 2 | 5 | 3 | 2 | 0 |
Given this record, I'm hoping to find a way to create a total of 17 records with unique dates:
Hi @A1 based on your description I have amended my workflow. Let me know what you think?
JS420,
This is fantastic - thanks so much. I'm now going through how you have the workflow set up and deducing the reason for placing the workflow objects where you did - this provides a great learning opportunity. I'll start testing with my real data set to see how it works, but I think this will get me very close.
Follow-up question: Currently the date ranking is static which means if every bus needed a date in January, each bus would be given the same date. The data set I'll be using has well over 100 #s. Each bus showing as late on the same day is not realistic, though on rare occasions it does happen ;-). I suspect this may hinder performance some, but do you know what tools/processes I could look into that would reset the date rankings after the records of each bus is created?
Thanks again,
A1
Hi @A1 you could wrap that workflow in a Batch Macro and from your data deduce a minimal month so in case a bus only had it's first instance of being late in March the generate rows would start from March rather than January.
JS420,
I ended up adding the Bus # to your random date approach instead of using a batch macro (as I'm not too adept at creating them). The process will create a temp table having a record for every bus and date combination between the min date and max date.
I'll need to keep in mind that as the number of busses increase, the number of records created for each bus-date combination will increase, and thus the need to set a larger RandInt () value (say from 1000 to 100,000 to 1,000,000) and potentially use a data type change (from Int16 to Int32 to Int64).
I suspect this approach may not be as processor and/or time efficient as a batch macro, but it's a way of getting it to work until I can get more familiar with macro creation.
Thanks again for your assistance with my request. I continue to be amazed at what can be accomplished with helpful communities such as this one.
A1