Discussion thread for day 2 of the Advent of Code - https://adventofcode.com/2025/day/2
Great puzzle today! I will admit that I was very challenged by part 2, but in the end I learned a lot of new skills! Excited for the next challenge!
Ran into a myriad of issues no thanks to my reading comprehension and improper data types... Nonetheless a fun challenge.
Oh boy do I have a lot to say about this one so buckle in if you actually read these, its mind-dump time.
First, treating RegEx as part of BaseA is something akin to pretending alcohol is not a drug: If it quacks like a duck it is probably a social construct created by a polite fiction of additional embedding.
So, no RegEx for me today, a stance which I will definitely reverse once things get hard.
That said, this is a neat problem to solve in Alteryx, as evidence by the attached file where I do Part 1 two different ways and Part 2 3x.
There are three parts to the problem(s)
1. Get the data into a usable form. Minimally, parse the ranges, maybe turn them into start/end or start/length pairs.
2. Find all the IDs to check.
3. Check them and throw out all the invalids (Hospital privatization joke removed.)
4. Profit. Also, sum things and consider duplicates.
Spoilery discussion of the First Pass, or how to solve it quickly.
The first pass through each is usually the bluntest. If I'm trying to get the thing done fast, I'm going to do each step in isolation and, where possible, just bash through problems. This is Part1 Initial and Part2 Ugly in the file. For Part 1 I just went through each text parsing step until I reached the end. Cut Start and End out of each range, create all the IDs from Start to End, Throw out the odd ones because why not minimally optimize, Split the remainder in half, compare left side to right side and keep the matches and you're done with part 1. Part 2, taken as a brute force problem, is really just part 2 done for every factor between 2 and Max(ID.Length). Once that showed itself to be 10, I just built the other 8 cases and bing (not Bing. Never Bing.) you have something you can sum into an answer after you make it not a string again because you forgot to do that like, every time.
I don't like stopping at the brute force solution so after checking in the time, back we go to put on our optimization hats.
Start by expanding the optimization from part1. Instead of odd lengths, we can throw out any combination of Length and UnitSize (the size of duplicated string) where Mod(Length, UnitSize) !=0. Can't find a pattern of 3 in a string of 8.
Then, instead of doing 10 different cases, we can treat every case as just having 10 substrings of length UnitSize, noting that we create a bunch of blanks when we exceed N=Length/UnitSize. That's fine. We can deal with blanks using the magic of cross tabulation.
Once we've Cross-Tabbed to leave only the non-empties, we can use Count Distinct and grouping to count the number of different patterns of length UnitSize for each ID & UnitSize combo. Count=1 implies a full pattern match, so we keep those, Unique on ID, because 1111 will be a match on both UnitSize=1 and UnitSize=2 and we have it. Cross tab, Filter, Summarize is a useful pattern to keep in your head for I've got this set of columns and I need to find the most/best/number of, etc.
So that's good, but we can do better. Creating all 10 fields and having to Crosstab suggests we should be testing some other way.
Final (both parts) version or how to solve it pretty:
Enter the Replace function and the glory of the void (the part of the void will be played by IsEmpty())
First half stays the same as all versions. Text to Columns the rows, generate rows twice. First one for the IDs, second one for the possible UnitSize combinations. Embedding the maximum unit size as Length/2 into the Generate rows saves us a step and some records. Records are the worst. Throw them out as soon as you can.
Note that we are no longer calculating Start or End, we're just inserting their definitions in the couple of places we use them. If you're making a workflow for reasonable people, do not do this thing. Put your variables in fields, name them things. This is not for reasonable people though; this is for the subset of maniacs inclined to do AoC. So we embed.
After creating all our possibilities, we test in a single filter tool. Same note as above, if this was for public consumption, define the thing in a formula and filter on the thing. Here however, we filter on IsEmpty(Replace(tostring([ID]),left(tostring([ID]),[UnitSize]), ""))
ToStrings because we just leave ID as a number all the way through instead of swapping to strings and back in the brute force version.
The filter gives us all the invalid IDs. From there, part 1 is just take the ones where UnitSize=Length/2 and Part 2 is, once again, taking the Unique values.
I wanted to use Regex_match but couldn't get it to work correctly. Instead, I took a brute-force approach by creating and checking every possible pattern. Reviewing other people's Regex solutions was very insightful.
Ah today was a breath of fresh air 😀
