Euleryx Problem 14 – Longest Collatz Sequence
My workflow and macros:
Workflow:
Macro:
Answer: 837799
Last Week's Favourite Solution:
Last weeks challenge had many different solutions, some admittedly requiring more tools than others. Forward thinking is always bonus so it was great to see all the large number (big int) solutions. Last weeks favourite solution goes to @AkimasaKajitani for being the first person to post a solution which accurately computes the entire value, using string addition. Please find this solution on page one of last week's post or click here.
Definitions:
The rules for the Collatz conjecture are simple:
- If the number is odd, then multiply it by 3 and then add one.
- If the number is even, then divide it by 2.
Mathematical Theory:
This problem can be solved via a bruit force method using two generate rows, to calculate the Collatz sequence for all numbers up to 1,000,000 but it takes 18 seconds to run, and repeats a lot of calculations. If we could reduce the number of calculations we could subsequently reduce the run time.
For example, lets start by calculating the Collatz sequence for all numbers 1-5:
1: 1 (length 1)
2: 2, 1 (length 2)
3: 3, 10, 5, 16, 8, 4, 2, 1 (length 8)
4: 4, 2, 1 (length 3)
5: 5, 16, 8, 4, 2, 1 (length 6)
Now when we come to calculate the sequence for 6, we could just start from scratch, or we could do the following:
Calculate the terms of the sequence, until we dip below the starting value. The reason this is significant is because we have already calculated all the sequences (chains) for these smaller numbers. In this case, that is simply the second term as the sequence goes 6, 3
But we already know the full sequence for 3. So instead of recalculating, lets just add their lengths together to find the total length of the sequence for 6. E.g
We had a chain of length 2, for the start value of 6 and a chain of 8 for the start value of 3. As the number 3 appears in both sequences we do have to subtract one from the total chain length as we don’t want to count the same number twice. So the total length for the 6the collatz sequence is:
2 + 8 – 1 = 9
Same can be applied for a start value of 7:
7: 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5 (length 12)
5 was the first number in the sequence less than 7, se we have stopped there. Now we will join the two chains together to get a combined length of 12 + 6 – 1 = 17
So this basic principal of joining the chains together clearly works when the chain is only split in two. But what about if its got more than two splits, well that’s where the iterative element comes in.
Lets consider the number 12. Its sequence begins as follows:
12: 12, 6 (length 2)
Now we can join to the sequence of 6
6: 6, 3 (length 2)
So we have a combined length of 3, but the sequence is not yet complete, it only goes 12, 6, 3.
Well by converting the above chain joining principal into an iterative process, we can then join this new chain, back to the chain for 3.
3: 3: 3, 10, 5, 16, 8, 4, 2, 1 (length 8)
To get a total chain length of 10.
By joining shorter chains together instead of recalculating each time, the run time goes from 18 down to just 5 seconds. (I have included both solutions in the file below so feel free to test them yourself).
Method:
1) Generate records from 2 – 1,000,000 (no need to include 1 as it's already a complete chain).
2) Use another generate rows to implement the sequence rules; however, when the value dips below the “Input” value, stop computing. Instead, we will use the joining logic discussed earlier.
3) For each starting record, find the number of terms in the chain (count) and the last value calculated in the sequence so far. This should be the first value that dips below the “input”
4) Now we need to join the sequences together until we have the complete chains. To do this, I have used an iterative macro.
a) The first step removed any records which have a “last chain” value of 2. These records have effectively completed the cain (the final steps are 2 and then 1) so we can successfully output them.
b) For all remaining records, we can join back to our original macro input. This will now connect an incomplete sequence with the next section of said sequence. You will notice I have also updated the “Last_chain” value. As we have just joined two chains together, we need to take the “Last_chain” value from the new part of the chain.
c) Add the counts together. Again, we have just joined two chains together so their lengths (counts) must be combined.
d) Simply loop round until all the records have a “Last_Chain” value of 2.
5) Final steps are to sort the chain lengths in descending order, then take the top record.
6) Submit your answer to the Project Euler Website!
Summary:
Even though it may seem like extra work, by creating an iterative macro, we have removed the need to calculate the same section of a chain over and over again, reducing the run time by over 60%.
Want to find out more, follow this link to our introduction post - Euleryx: Let The Games Begin.
I see your iterative macros @Hub119 ... and I raise you a DCloud with no macros for the win!
The second Generate Rows formula was where I applied the iteration logic - I forget sometimes that Generate Rows isn't only for adding 1 or generating dates in between a start and end date. Fun!
Following the problem statement, I built a simple iterative macro that took 14 seconds to run.
I take the easy way, the Brutal Force way.
My solutions are Brute force(31.8sec) with no macro and Iterative Macro solution(23.9sec).
Mine is 'a bit' wise brute-force that drops the hopeless numbers in advance. This approach enabled me to just examine a half of 1million.
I'm 90% certain about my logic but if you find some flaw, please let me know😂
i try to records the old chains join with the current chains (as you mentions) but it work well in python. but not in Alteryx. the Join take so much time in compare the current due to 1million join X million. even after cancel some data.
and interesting that when go through the non-macro ways. it was slower than the macro ways. I thought is always faster than macro method.
i also test non-macro, macro, multiple-macro
2 macro
since it divide 2, so I look for binary also. where it interesting to where it have 0 between the gap.
