On this page

A Statistically Good Solution to Advent of Code 2025/08

It's not perfect, but I can say with high probability that it's good.

Advent of Code 2025/08: Playground

Tip: Check out the problem here: https://adventofcode.com/2025/day/8. Give it a go before reading the rest of this article. I haven’t written out the whole solution explicitly, but there are some spoilers.

My original solution is here: src/year2025/day08.rs.

This day’s problem reduces nicely to finding the Minimum Spanning Tree (MST), where the weight of a given edge (𝑒,𝑣) is its Euclidean length β€–π‘’βˆ’π‘£β€–.

Unfortunately, the specific features of the MST required to solve the problem restrict us to using Kruskal’s algorithm, so an π’ͺοΈ€(|𝐸|log|𝑉|) solution is the best we can do.

The Proper Solution

To solve this problem in the general case, one must consider each of the (𝑛2) Euclidean distances between the points provided. It is possible to do this somewhat efficiently, but it still limits performance.

If an oracle could tell us the longest edge in the MST, we could ignore any edges longer than that and dramatically reduce the amount of computation we have to perform.

So… can we do that?

A Statistical Approach

Assumptions

Having a quick look at the input, we can see that the points lie roughly uniformly in an approximate cube with side length 100000. For my input, the actual dimensions are (99859,99496,99777). To be on the safe side, we’ll pick the longest side length as the side length of the cube.

Prior Results

Dette and HenzeΒ [1] showed that the random variable 𝐷𝑛, representing the length of the longest edge in the nearest-neighbour graph on 𝑛 points in the unit 𝑑-cube, follows the limit law

23πœ‹π‘›π·π‘›3βˆ’23ln𝑛+23lnlnπ‘›βˆ’ln6+13lnπœ‹β†’π‘›βŸΆβˆžπ’ŸοΈ€π’’οΈ€

where 𝒒︀ is the Gumbel distribution.

In other words, that function of 𝐷𝑛 converges to a particular probability distribution that we can more easily analyse.

Another important result is that of PenroseΒ [2], who showed that as π‘›βŸΆβˆž, β„™(𝐷𝑛=𝑀𝑛′)⟢1, where 𝑀𝑛′ is the random variable representing the longest edge in the MST on the same set of points.

Using these results, we can begin to work towards a more concrete solution.

Derivation

Let

π‘Šπ‘›=23(πœ‹π‘›π·π‘›3βˆ’ln𝑛+lnln𝑛)βˆ’ln(6πœ‹1/3)

from above.

By Dette and HenzeΒ [1] and the CDF of the Gumbel distribution, we know

β„™(π‘€π‘›β€²β‰€π‘š)=βˆΌβ„™(π‘Šπ‘›β‰€π‘€(π‘š))=∼exp(βˆ’π‘’βˆ’π‘€(π‘š)).

We want to solve for π‘š, given some 𝑛 and 𝛿 such that β„™(π‘€π‘›β€²β‰€π‘š)β‰₯1βˆ’π›Ώ.

So…

β„™(π‘€π‘›β€²β‰€π‘š)β‰₯1βˆ’π›Ώexp(βˆ’π‘’βˆ’π‘€(π‘š))β‰₯1βˆ’π›Ώπ‘’βˆ’π‘€(π‘š)β‰€βˆ’ln(1βˆ’π›Ώ)β‰ˆπ›Ώπ‘€(π‘š)β‰₯ln(π›Ώβˆ’1).

Substituting 𝑀:

23(πœ‹π‘›π‘š3βˆ’ln𝑛+lnln𝑛)βˆ’ln(6πœ‹1/3)β‰₯ln(π›Ώβˆ’1).

Solving for π‘š, we see:

π‘šβ‰₯(32ln(6π›Ώπœ‹1/3)+lnπ‘›βˆ’lnlnπ‘›πœ‹π‘›)13.

Since we are looking for β„™(π‘€π‘›β€²β‰€π‘š), we can take the strictest bound on π‘š and assign equality.

Interpretation

Recall the setup we outlined earlier: We have 𝑛 random points, uniformly distributed in the unit cube [0,1]3.

We’ve just shown that the longest edge in the MST on those points does not exceed π‘š with confidence 1βˆ’π›Ώ.

Since π‘š increases only slowly with increased confidence levels, as shown by FigureΒ 1, we can compute π‘š with high confidence without incurring a huge computational cost.

Values of π‘š plotted for different 𝛿 on 1000 points.

Application to The Problem

We can now use this result to speed up our solution to the problem!

Finding the Cube

First, we must determine the size of the cube our points lie within. To do this, we iterate over every point and keep track of the smallest and largest values found for the π‘₯, 𝑦 and 𝑧 components.

fn min_max(points: &[(u32, u32, u32)]) -> ((u32, u32, u32), (u32, u32, u32)) {
    points.iter().fold(
        ((u32::MAX, u32::MAX, u32::MAX), (u32::MIN, u32::MIN, u32::MIN)),
        |(min, max), p| {
            let min = (min.0.min(p.0), min.1.min(p.1), min.2.min(p.2));
            let max = (max.0.max(p.0), max.1.max(p.1), max.2.max(p.2));

            (min, max)
        },
    )
}

Then, we can pick the longest edge to use as the side length of our cube.

let point_extent = (
    point_max.0 - point_min.0,
    point_max.1 - point_min.1,
    point_max.2 - point_min.2,
);

let max_dim = point_extent.0.max(point_extent.1).max(point_extent.2);

There may be better ways of doing this, especially if you could guarantee that the points lie within a cube, but I doubt it matters in practice.

Using the Result

Calculating π‘š with what we’ve found is quite easy:

const DELTA: f64 = 0.1;

// Obviously f64::cbrt() is not const. Why would it be?
// const K: f64 = 6.0 / std::f64::consts::PI.cbrt();
const K: f64 = 4.09670437953;

const K_ON_DELTA: f64 = K / DELTA;

// This obviously isn't const either.
// const LN_K_ON_DELTA: f64 = K_ON_DELTA.ln();
const LN_K_ON_DELTA: f64 = 3.71276793360;

fn expected_max_mst_edge_len(n: f64) -> f64 {
    let ln_n = n.ln();
    let num = 1.5 * LN_K_ON_DELTA + ln_n - ln_n.ln();
    let den = n * std::f64::consts::PI;
    (num / den).cbrt()
}

That function gives us π‘š for a unit cube, but we want the equivalent value for our non-unit cube. Fortunately, we can simply scale the length by the side length we computed earlier.

let cutoff = expected_max_mst_edge_len(num_points as f64) * max_dim;

Now, any edges longer than cutoff can be completely ignored in our MST construction!

Benchmarks

Before: 485.74 Β΅s
After: 297.41 Β΅s

A 38.77% performance improvement! Not too bad.

Warning: I can only test this on my input, so run the code on your own input to verify these results if you want.

Limitations

Since the threshold is only a statistical estimate, not a guarantee, this is not guaranteed to give the correct answer in every case. The most obvious degenerate case is having one point at (0,0,0) and the remaining π‘›βˆ’1 points at (𝑙,𝑙,𝑙).

If the points were placed with some known distribution, it may be possible to re-derive something similar to what Dette and HenzeΒ [1] show, but that would be extremely tedious.

Tip: Check out the code at src/year2025/simd/day08.rs.

Bibliography

  • [1] H. Dette and N. Henze, β€œThe Limit Distribution of the Largest Nearest-Neighbour Link in the Unit 𝑑-Cube”, Journal of Applied Probability, vol. 26, no. 1, pp. 67–80, Mar. 1989, doi: 10.2307/3214317.
  • [2] M. D. Penrose, β€œOn K-Connectivity for a Geometric Random Graph,” Random Structures & Algorithms, vol. 15, no. 2, pp. 145–164, Sep. 1999.