Euleryx Problem 14 – Longest Collatz Sequence

My workflow and macros:
Spoiler
Workflow:

Macro:

Answer: 837799
Last Week's Favourite Solution:
Last weeks challenge had many different solutions, some admittedly requiring more tools than others. Forward thinking is always bonus so it was great to see all the large number (big int) solutions. Last weeks favourite solution goes to @AkimasaKajitani for being the first person to post a solution which accurately computes the entire value, using string addition. Please find this solution on page one of last week's post or click here.
Definitions:
- Conjecture: A conjecture is a mathematical rule/principal that is assumed to be true. They will typically have been tested on a large range of numbers but because there is no rigours mathematical proof, it cannot be stated as a fact.
- Collatz Conjecture: The Collatz conjecture states that you can choose any starting number, and by following the rules listed below, you will always, eventually, get back down to the value “1” (the end of the sequence).
The rules for the Collatz conjecture are simple:
- If the number is odd, then multiply it by 3 and then add one.
- If the number is even, then divide it by 2.
Mathematical Theory:
This problem can be solved via a bruit force method using two generate rows, to calculate the Collatz sequence for all numbers up to 1,000,000 but it takes 18 seconds to run, and repeats a lot of calculations. If we could reduce the number of calculations we could subsequently reduce the run time.
For example, lets start by calculating the Collatz sequence for all numbers 1-5:
1: 1 (length 1)
2: 2, 1 (length 2)
3: 3, 10, 5, 16, 8, 4, 2, 1 (length 8)
4: 4, 2, 1 (length 3)
5: 5, 16, 8, 4, 2, 1 (length 6)
Now when we come to calculate the sequence for 6, we could just start from scratch, or we could do the following:
Calculate the terms of the sequence, until we dip below the starting value. The reason this is significant is because we have already calculated all the sequences (chains) for these smaller numbers. In this case, that is simply the second term as the sequence goes 6, 3
But we already know the full sequence for 3. So instead of recalculating, lets just add their lengths together to find the total length of the sequence for 6. E.g
We had a chain of length 2, for the start value of 6 and a chain of 8 for the start value of 3. As the number 3 appears in both sequences we do have to subtract one from the total chain length as we don’t want to count the same number twice. So the total length for the 6the collatz sequence is:
2 + 8 – 1 = 9
Same can be applied for a start value of 7:
7: 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5 (length 12)
5 was the first number in the sequence less than 7, se we have stopped there. Now we will join the two chains together to get a combined length of 12 + 6 – 1 = 17
So this basic principal of joining the chains together clearly works when the chain is only split in two. But what about if its got more than two splits, well that’s where the iterative element comes in.

Lets consider the number 12. Its sequence begins as follows:
12: 12, 6 (length 2)
Now we can join to the sequence of 6
6: 6, 3 (length 2)
So we have a combined length of 3, but the sequence is not yet complete, it only goes 12, 6, 3.
Well by converting the above chain joining principal into an iterative process, we can then join this new chain, back to the chain for 3.
3: 3: 3, 10, 5, 16, 8, 4, 2, 1 (length 8)
To get a total chain length of 10.
By joining shorter chains together instead of recalculating each time, the run time goes from 18 down to just 5 seconds. (I have included both solutions in the file below so feel free to test them yourself).
Method:
1) Generate records from 2 – 1,000,000 (no need to include 1 as it's already a complete chain).

2) Use another generate rows to implement the sequence rules; however, when the value dips below the “Input” value, stop computing. Instead, we will use the joining logic discussed earlier.

3) For each starting record, find the number of terms in the chain (count) and the last value calculated in the sequence so far. This should be the first value that dips below the “input”

4) Now we need to join the sequences together until we have the complete chains. To do this, I have used an iterative macro.
a) The first step removed any records which have a “last chain” value of 2. These records have effectively completed the cain (the final steps are 2 and then 1) so we can successfully output them.

b) For all remaining records, we can join back to our original macro input. This will now connect an incomplete sequence with the next section of said sequence. You will notice I have also updated the “Last_chain” value. As we have just joined two chains together, we need to take the “Last_chain” value from the new part of the chain.

c) Add the counts together. Again, we have just joined two chains together so their lengths (counts) must be combined.

d) Simply loop round until all the records have a “Last_Chain” value of 2.

5) Final steps are to sort the chain lengths in descending order, then take the top record.

6) Submit your answer to the Project Euler Website!

Summary:
Even though it may seem like extra work, by creating an iterative macro, we have removed the need to calculate the same section of a chain over and over again, reducing the run time by over 60%.
Want to find out more, follow this link to our introduction post - Euleryx: Let The Games Begin.