On this page
A Statistically Good Solution to Advent of Code 2025/08
Advent of Code 2025/08: Playground
Tip: Check out the problem here: https://adventofcode.com/2025/day/8. Give it a go before reading the rest of this article. I havenβt written out the whole solution explicitly, but there are some spoilers.
My original solution is here: src/year2025/day08.rs.
This dayβs problem reduces nicely to finding the Minimum Spanning Tree (MST), where the weight of a given edge is its Euclidean length .
Unfortunately, the specific features of the MST required to solve the problem restrict us to using Kruskalβs algorithm, so an solution is the best we can do.
The Proper Solution
To solve this problem in the general case, one must consider each of the Euclidean distances between the points provided. It is possible to do this somewhat efficiently, but it still limits performance.
If an oracle could tell us the longest edge in the MST, we could ignore any edges longer than that and dramatically reduce the amount of computation we have to perform.
So⦠can we do that?
A Statistical Approach
Assumptions
Having a quick look at the input, we can see that the points lie roughly uniformly in an approximate cube with side length . For my input, the actual dimensions are . To be on the safe side, weβll pick the longest side length as the side length of the cube.
Prior Results
Dette and HenzeΒ [1] showed that the random variable , representing the length of the longest edge in the nearest-neighbour graph on points in the unit -cube, follows the limit law
where is the Gumbel distribution.
In other words, that function of converges to a particular probability distribution that we can more easily analyse.
Another important result is that of PenroseΒ [2], who showed that as , , where is the random variable representing the longest edge in the MST on the same set of points.
Using these results, we can begin to work towards a more concrete solution.
Derivation
Let
from above.
By Dette and HenzeΒ [1] and the CDF of the Gumbel distribution, we know
We want to solve for , given some and such that .
Soβ¦
Substituting :
Solving for , we see:
Since we are looking for , we can take the strictest bound on and assign equality.
Interpretation
Recall the setup we outlined earlier: We have random points, uniformly distributed in the unit cube .
Weβve just shown that the longest edge in the MST on those points does not exceed with confidence .
Since increases only slowly with increased confidence levels, as shown by FigureΒ 1, we can compute with high confidence without incurring a huge computational cost.
Application to The Problem
We can now use this result to speed up our solution to the problem!
Finding the Cube
First, we must determine the size of the cube our points lie within. To do this, we iterate over every point and keep track of the smallest and largest values found for the , and components.
fn min_max(points: &[(u32, u32, u32)]) -> ((u32, u32, u32), (u32, u32, u32)) {
points.iter().fold(
((u32::MAX, u32::MAX, u32::MAX), (u32::MIN, u32::MIN, u32::MIN)),
|(min, max), p| {
let min = (min.0.min(p.0), min.1.min(p.1), min.2.min(p.2));
let max = (max.0.max(p.0), max.1.max(p.1), max.2.max(p.2));
(min, max)
},
)
}Then, we can pick the longest edge to use as the side length of our cube.
let point_extent = (
point_max.0 - point_min.0,
point_max.1 - point_min.1,
point_max.2 - point_min.2,
);
let max_dim = point_extent.0.max(point_extent.1).max(point_extent.2);There may be better ways of doing this, especially if you could guarantee that the points lie within a cube, but I doubt it matters in practice.
Using the Result
Calculating with what weβve found is quite easy:
const DELTA: f64 = 0.1;
// Obviously f64::cbrt() is not const. Why would it be?
// const K: f64 = 6.0 / std::f64::consts::PI.cbrt();
const K: f64 = 4.09670437953;
const K_ON_DELTA: f64 = K / DELTA;
// This obviously isn't const either.
// const LN_K_ON_DELTA: f64 = K_ON_DELTA.ln();
const LN_K_ON_DELTA: f64 = 3.71276793360;
fn expected_max_mst_edge_len(n: f64) -> f64 {
let ln_n = n.ln();
let num = 1.5 * LN_K_ON_DELTA + ln_n - ln_n.ln();
let den = n * std::f64::consts::PI;
(num / den).cbrt()
}That function gives us for a unit cube, but we want the equivalent value for our non-unit cube. Fortunately, we can simply scale the length by the side length we computed earlier.
let cutoff = expected_max_mst_edge_len(num_points as f64) * max_dim;Now, any edges longer than cutoff can be completely ignored in our MST construction!
Benchmarks
Before: 485.74 Β΅s
After: 297.41 Β΅s
A 38.77% performance improvement! Not too bad.
Limitations
Since the threshold is only a statistical estimate, not a guarantee, this is not guaranteed to give the correct answer in every case. The most obvious degenerate case is having one point at and the remaining points at .
If the points were placed with some known distribution, it may be possible to re-derive something similar to what Dette and HenzeΒ [1] show, but that would be extremely tedious.
src/year2025/simd/day08.rs.Bibliography
- [1] H. Dette and N. Henze, βThe Limit Distribution of the Largest Nearest-Neighbour Link in the Unit -Cubeβ, Journal of Applied Probability, vol. 26, no. 1, pp. 67β80, Mar. 1989, doi: 10.2307/3214317.
- [2] M. D. Penrose, βOn K-Connectivity for a Geometric Random Graph,β Random Structures & Algorithms, vol. 15, no. 2, pp. 145β164, Sep. 1999.