OK, really really glad to have this dataset. Visited London for the first time this past May and while on a riverboat on the Thames, Tower Bridge opened to let through a cruise ship coming up river. A tour guide on our boat was very loud and adamant that this was an incredibly rare event - it only happens a couple dozen times per year, she said. Some simple googling in the moment proved that wasn't true, and maybe we just didn't understand what she was saying was so rare. Glad to see that the data shows it isn't very rare at all!
But this did take me awhile, particularly the tricky interpretation issue with the bonus question. Finally got around to it though.
Solution attached. Spent a while on the bonus question since I didn't see the logic. I understand the method proposed in the solution, but don't agree that it will give us the most likely time to see a lift.
I think that either the bonus answer as provided is not correct, or if that is the desired answer, the bonus question ought to be reworded. The given solution suggests that the 2pm hour on Sundays in August is a good time to see the lift, but really it was just very busy during the 2pm hour on a single August Saturday (2015-08-30) and was never in operation during that hour/month/day combination at any other point in the dataset, thus giving a high “average”. This solution penalizes hour/month/day combinations where the bridge regularly lifts a few times and isn’t accounting for the majority of occasions in which the bridge lifts ZERO times during the 2pm hour on August Sundays.
Consider the 9 on Saturdays in July combination. There are a whopping 24 lifts on 2016-07-30, along with a single lift during that window on two other days. 9 on Saturdays in July has a total of 26 lifts in the dataset, more than twice as many times as the provided solution. The solution suggests that because 12/1 is greater than 26/3, you’re going to be more likely to see the bridge lift on 2pm on August Saturdays. This doesn’t make sense.
A better solution would be to just find the hour month day combination which occurs the most in the dataset (15 on Saturdays in September occurs 29 times).
The most correct solution probably requires summing the number of lifts during each hour/day/month grouping and dividing by a count of the number of Saturdays, Sundays, etc. in each month during 2015 – 2018.