Welcome back to Project Euleryx, this time: Largest Prime Number.
This week's post is going to be a little longer than previous posts, as we'd like to cover some of the mathematical theory behind our solution. That being said, just because there is some theory behind our approach, doesn’t mean it is the only approach that works, so please feel free to share your thoughts below.
Euleryx Problem 3 - Largest Prime Number
The challenge is to find the largest prime factor of the number 600851475143.
Here's my full workflow and answer:
Last Weeks Favourite Solution:
Another tough decision for our favourite solution so we have gone with a tie. Both @CoG and @gawa provided excellent solutions. Calculating the neccesary terms only and storing variables within a single column in a generate rows was genius. Checkout there solutions on page one of last weeks challenge here.
Definitions:
Mathematical Theory:
During this problem, we exploited what’s called the Fundamental Theorem of Arithmetic (FTA), which states:
“Every positive integer (whole number) can be written as a product of prime numbers, or the number itself is prime.”
e.g.
If we take the number 30, we can write this as 2 x 3 x 5
The Method to our Madness (Factor Trees):
We decided that to tackle this problem, we would list all the prime factors of the number, then simply take the largest. The tricky part here was creating that list of prime numbers.
Using the Fundamental Theorem of Arithmetic, we know that 600851475143, can be written as a product of its prime factors. But how do we find these prime factors? We decided to use what's called a factor tree.
A factor tree works as follows:
1) Identify any 2 factors of your number.
2) If the number is prime, circle it.
3) If the number is not prime, thanks to the FTA, we know that it too can be expressed as a product of its primes, so, repeat steps 1 and 2 for each uncircled number.
4) Stop when there are no more uncircled numbers at the bottom of the tree.
e.g. for the number 30, the tree would grow as follows:
First step:
Second step:
Therefore for we can see, 30 can be written as 5 x 2 x 3
In case you’re worried that you didn’t think of 5 and 6 to begin with, rest assured, it doesn’t matter which two numbers you choose, as long as they make up a factor pair. Here are some other examples of possible factor trees for the number 30:
All of which lead to the same answer. With this in mind, the only thing left was to create a workflow in Alteryx, that can create these factor trees.
Setting up the Workflow:
Whilst all this theory is great, we still need a way to get the computer to do the hard part. If it wasn’t apparent from the above, we will be using an iterative macro to solve this problem. So, the first step is to decide what information we need to be passed between each iteration.
For this question, we fed 3 columns into the macro, as shown in the image below. You will notice we have two starting numbers to help illustrate the workflows process, as we step through it:
The Iterative Macro:
Yes, it's time for the tough part, but fear not, let's break the steps down together. On each iteration, our goal is:
i.e. at the start of the first iteration:
vs the end of the first iteration (on the loop anchor). Notice [Number] * [List of Prime Factors] = [Starting Number]
This is achieved as follows:
Without this step, each iteration would be passing the same number through, so it's vital that we remember this part.
Final Steps:
Running this macro should give you the following output.
As you can see, all the prime factors are listed in the “List of Prime Factors” column, with a comma separating them. It's important to note that this list may include duplicates, for example, if you started with the number 4, the “List of all prime Factors” should be “2,2” as 2 x 2 = 4.
To find the largest prime factor for each starting value, simply parse out the “List of Prime Factors” columns, sort the results descending, and select the top record per “Starting Number”:
Now, all that's left to do is submit your answer to the Project Euler website!
Summary:
By using Alteryx to Iteratively create a factor tree, one branch at a time, we have been able to reduce any number down to its prime factors. Thanks to the Fundamental Theorem of Arithmetic, we know that the largest prime number from our list, is indeed the largest prime factor of 600851475143.
Want to find out more, follow this link to our introduction post - Euleryx: Let The Games Begin.
Still committed to String Coding!
There are two optimizations that we can make:
1. For a given number, pull all copies of a factor out immediately.
2. If we remove prime factors from smallest to largest, then we know that the next factor has to be greater than the last (which reduces our search domain)
The other useful constraint is that we do not need to know all factors, only the largest prime factor. Our logic can thankfully be simplified then since we don't need to keep track of every factor we identify.
Happy Solving, Y'all!
This is definitely the case where I didn't get the fewest tools but I'm happy with mine.
I used a Generate Rows to find all possible factors of an input number from 1 to the square root of the number. The reason that I did the square root is because this will give me half of all the factors. I can then take the factors, divide them into the starting number, and get the matching factor.
For ex, if I'm trying to find all the factors of 99, I only need to find the factors from 1 to sqrt(99). Once I find 3, I can use it to find 33 by going 99/3. Once I find 9, I can do 99/9 = 11 to get its matching factor.
---
I built a macro that could identify if a number is prime or not. I have a hunch that I'll be happy to have that as the challenge continues.
The way the macro works is:
1. Generate Rows for all numbers from 1 to square root of the number
2. Find if those numbers are factors of the starting number by using the mod function
3. Count the number of factors
4. If the count is 1, then the number is prime. If it's greater than 1, then it's not prime
---
Then what I did was find all the factors of the giant number, and evaluate if they're prime or not. Then I found the highest.
Similar approach to @CoG .
For reference, RSA encryption relies on the mathematical properties of prime numbers to ensure secure data transmission over the internet. https://en.wikipedia.org/wiki/RSA_cryptosystem
While multiplying two prime numbers is easy and fast, factoring a large number back into its prime components is extremely difficult.
If you're feeling brave, try factoring this number: 10142789312725007.
(My computer gave up after running for 3 minutes.)😂
For anyone interested in my workflow, here is what I did. Took me awhile to consolidate my thinking process...
SPOILER tag!
[Divider] <= CEIL(SQRT([Largest Prime Number]))
This condition ensures the divisor doesn’t exceed the current number being factored, and we loop it by:
[Divider] + 1
This loops until we get 775,146 records of data. We have these conditions to avoid generating unnecessary rows and improve efficiency, we only need to test divisors up to the square root of the number.
Step 3
After having all those rows, we can now check for divisibility. We do this with Formula (6)...
We create two new fields:
[IsFactor] to check if the divisor is a factor
IF MOD([Largest Prime Number],[Divider]) = 0
THEN 1
ELSE 0
ENDIF
[New_Value] to update the number if a factor is found
IF [IsFactor]
THEN [Largest Prime Number] / [Divider]
ELSE [Largest Prime Number]
ENDIF
Step 4
Now that we have the IsFactor ready and the New_Value calculated... we can simply just filter out the [IsFactor] = 1 (True rows). This keeps only the rows where Divisor divides the largest prime number evenly. We do this with Filter (7).
Step 5
Now we need to check for PRIME factors! So... we need to verify that each factor is not divisible by any number less than itself (other than 1 of course). So we add another generate rows!
We create a new field called [IsReallyFactor] again with an initialization expression of 2. Then we set the condition.. similar to before:
[IsReallyFactor] <= SQRT([Divider])
...and we loop it similar to before:
[IsReallyFactor] + 1
Once we have looped it, we will get 1,413 records for us to check against. We then use Formula (9) to perform the check with a new field called [IsReallyReallyFactor] with the expression:
IF MOD([Divider],[IsReallyFactor]) = 0
THEN 1
ELSE 0
ENDIF
So we then have all the real factors lined up with 1s and 0s.
Step 6
We now just need to group by the Dividers and sum up the [IsReallyReallyFactor] field! A prime factor will have Sum_IsReallyReallyFactor = 0 (since it’s only divisible by 1 and itself, and we start testing from 2).
Then from there, we just filter out Sum_IsReallyReallyFactor = 0 to get all our PRIME FACTORS!
Final notes
The naming convention was easier for me to follow since I was doing it quickly... but I hope the above helps.
This was really fun @Pilsner !
These solutions are brilliant, loved reading through the write-ups, and I still can’t quite believe the 3-tool solutions!
I never knew about RSA; that’s really helped my understanding of encrypted connections, the "handshake" certainly makes more sence now. I’ll be reading up on this a bit more, thanks @gawa
@Carolyn - you might be right, I'm pretty certain going to be useful to have a prime number macro. 😄
@caltang - Your field names 🤣 That looks like my AoC fields. "ActualRowCheckThisOneISwear" 🤣
My solution.
Main workflow
Macro
Output from the macro
a |
1471 |
6857 |
839 |
71 |
1471 * 6857 * 839 * 71 = 600851475143. OK!
Explanation
The idea is to output the prime factors as soon as it is found, and pass the remaining factors to the next iteration.
For example, say the input is 24, set the input data as below.
a | b | c | isPrime |
24 | 24 | 1 |
|
Generate rows for each row with [b] = sqrt([a]) ~ 2.
a | b | c | isPrime |
24 | 5 | 1 |
|
24 | 4 | 1 |
|
24 | 3 | 1 |
|
24 | 2 | 1 |
|
Fill in [c] with the quotient of [a] divided by [b] only if it is divisible. Otherwise set 0.
If [c] in all rows are 0 in a group of [a], it is a prime factor and the number is sent out of the macro.
The rows successfully divided are sent to the next iteration.
At the end of 1st iteration, the iteration output looks like below;
a | b | c | isPrime |
4 | 4 | 1 |
|
6 | 6 | 1 |
|
At the end of 2nd Iteration;
a | b | c | isPrime |
2 | 2 | 1 |
|
2 | 2 | 1 |
|
2 | 2 | 1 |
|
3 | 3 | 1 |
|
They are sent out of the macro at the end of 3rd iteration, because they are all prime factors.