Computing distance on road network is not as straightforward as on a map. The Earth is actually an ellipsoid and so distance should be computed on a sphere rather than on a plane. Accounting for all the ways the Earth is neither a plane nor a sphere can be computationally expensive. I recently implemented a choice few geodesic distance function in warthog, a pathfinding library.

Searching on a (irregular) sphere

In pathfinding, one often need to use distance heuristics. That is functions able to (quickly) evaluate distances. Classic examples include Euclidean distance, for searching on a plane; or, Manhattan distance, for searching on grids. However, these functions are not accurate when it comes to estimating distance in road networks.

The Earth is an ellipsoid,¹ an oblong object resembling an elongated sphere. For example, it is not technically correct to refer to “the radius of the Earth;” it depends where you measure.² Thus, measuring using the Euclidean distance quickly accumulates error.

**Figure 1:** Euclidean distance (red) is much shorter than great-circle (blue).

When computing distances, it is easier to consider a geodetic system. That is, use spherical coordinate. In this post, I will look at three distance functions, with different degrees of accuracy:

The spherical projected distance.
The great-circle distance.
The Vincenty distance.

Different distance functions

There is more than one way to compute distances on the globe. The main question boils down to: how accurate do you want it? Obviously, the more accurate the function the more expensive it is.

Some notation:

$\Delta\phi$ is the difference between the latitudes $\phi_2 - \phi_1$ , in radians, sign does not matter.
$\Delta\lambda$ is the difference between the longitudes.
The mean latitude is: $\phi_m = \frac{\phi_1 + \phi_2}{2}$ .
$R$ is the radius of the Earth.

The code is mainly in geography.cpp, with the cosine-based haversine in haversine_heuristic.cpp and Euclidean distance in euclidean_heuristic.cpp.

Spherical projected distance

The idea is to project a sphere (the Earth) onto a plane. This function takes into account the variation in distance depending on latitude. $\begin{equation} D = R \sqrt{(\Delta\phi)^2 + (\cos(\phi_m)\Delta\lambda)^2} \end{equation}$

Great-circle distance

The shortest distance on a plane is a straight line, on a sphere it’s a great-circle. An example of a great-circle on the globe is the equator. A great-circle divides a sphere into two (equal) hemisphere. The problem of finding the distance between two points on a sphere is thus to compute the arc between these points on the great-circle joining them.³ The distance is thus given by the spherical law of cosines: $\begin{equation} D = R \arccos(\sin \phi_1 \sin \phi_2 + \cos \phi_1 \cos \phi_2 \cos |D\lambda|) \end{equation}$

Haversine distance

For small distances and small float precision, the great-circle function can have large rounding errors. ⁴ In this case, we can use the haversine (half of a versine) formula: $\begin{equation*} \text{hav}(\theta) = \sin^2 \left(\frac{\theta}{2} \right) = \frac{1 - \cos \theta}{2} \end{equation*}$

Historically, this function was the de facto standard due to the availability of pre-computed tables, making it efficient to use by hand. Computing a distance on a sphere is then: $\begin{align} D =\ & 2 R \arcsin \left( \sqrt{\sin^2 \left(\frac{D\phi}{2} \right) + \cos \phi_1 \cos \phi_2 \sin^2 \left(\frac{D\lambda}{2} \right)} \right) \\ \iff & 2 R \arcsin \left( \sqrt{\cos \left(\frac{1 - D\phi}{2} \right) + \cos \phi_1 \cos \phi_2 \cos \left(\frac{1 - D\lambda}{2} \right)} \right) \end{align}$

The second form uses the cosine form of the haversine to avoid the expensive square operation.

The Vincenty formula

Finally, if you want a precision down to 0.6 mm, you can call on the Vincenty formula. This is a procedure that computes distances on an ellipsoid. But, first, there is a special case when the ellipsoid is a sphere which we can use: $\begin{equation} D = R \arctan \frac{\sqrt{(\cos \phi_2 sin \Delta \lambda)^2 + (\cos \phi_1 \sin \phi_2 - \sin \phi_1 \cos \phi_2 cos\Delta\phi)^2}}{\sin \phi_1 \sin \phi_2 + \cos \phi_1 \cos \phi_2 \cos\Delta\lambda} \end{equation}$

It’s already a more involved formula. I won’t go into the details on the actual Vincenty formula, the most important thing is that it is an iterative procedure. It means that, as opposed to everything before, it’s not a single operation. The (direct) Vincenty formula finds the azimuths and ellipsoidal distance between two points. It refines its approximation up to a constant – except a few cases that cannot converge.

Benchmarking

Test case

We will start by validating the code. No point in measuring performance if the results are bogus. I took the test case from movable-type.⁵ We will use two points in Australia: Flinders Peak and Buninyong.

Name	Long	Lat
Flinders Peak	144°25′29.52440″E	37°57′03.72030″S
Buninyong	143°55′35.38390″E	37°39′10.15610″S

First, let us convert the degree-minute-second notation to decimal degrees. The formula to convert is: $\begin{equation*} ddd°\ mm'\ SS.ssss'' = ddd + mm / 60 + SS.ssss / 3600 \end{equation*}$

Name	Lon	Lat
Flinders Peak	144.424868	-37.951033
Buninyong	143.926496	-37.652821

Now we can look at the results of calling these functions:

Function	Result (km)	Difference (%)
base	54.972271	0.000
spherical	54.925547725412486	0.085
great-circle	54.925395854762677	0.085
vincenty	54.925395854763295	0.085
vincenty-exact	54.972227314037639	0.000
haversine	54.925395854761305	0.085
haversine-direct	54.925395854762719	0.085
fast-haversine	54.925395854762833	0.085
haversine-approx	54.925717823812342	0.085
euclidean	58.077968914900702	-5.650

I think one interesting result from this is that geodesic functions underestimate the distance while Euclidean distance overestimates. The other is that, between these two points, all functions seems to have the same error – with the exception of the exact Vincenty procedure.

Using a database

I ran some tests by collecting known cases online. This is more of a sanity check rather than some actual tests.

The main result is that the approximate functions all seem to return the exact some values. The iterative Vincenty procedure is more accurate and for more points.

Benchmarking

Now, one thing I like a bit too much is looking at assembly. I tried to engineer the code so the distance function themselves are not too tightly bound to the search code. I had a quick look at how the compiler would optimise the functions:

spherical: projection onto the plane, less operations.
haversine: optimised to minimise power operations⁶ and has a direct deg_to_rad().
fast haversine: implemented using sincospi to reduce trigonometric operations.
approximate haversine: omit some trigonometric operations (sin, mainly) at the cost of precision.
great-circle: similar to haversine but with the classic formula.
vincenty: special case where the Earth is a sphere, resembles the two above.

I will not the exact Vincenty – not quite exact in this case⁷ – because it’s not a direct computation.

I counted the number of lines in the main function and its branches. This means that it is an upper bound on the number of operations executed as most are behind conditional jumps – especially for the fast haversine. I also defined a few constants to shave off one operation here and there.⁸

Function	# lines
Spherical	36
Great-circle	38
Haversine	51
Haversine (sin)	40
Haversine (fast)	400+
Haversine (approx)	53
Vincenty	70

It is interesting that the sin version compiles down to less instructions in the end. I tried fiddling with the code a bit but could not get the cos version to bo shorter. Let’s see how they fare.

I decided to run benchmarks on a decent dataset: 24M nodes of the road network of the USA. I ran some preliminary testing on small test sets and could not get reliable results.

Function	size	Total time (ns)	Time per query (ns)
spherical	23947347	5.14306e+08	21.477
great-circle	23947347	2.60185e+09	108.649
vincenty	23947347	3.1358e+09	130.946
vincenty-exact	23947347	2.22148e+10	927.652
haversine	23947347	1.87201e+09	78.172
haversine-sin	23947347	1.68385e+09	70.315
fast-haversine	23947347	4.58581e+09	191.496
haversine-approx	23947347	1.60163e+09	66.881
euclidean	23947347	1.08925e+08	4.549

Euclidean, having no trigonometric functions, is orders of magnitude faster than the rest. Then comes the spherical projection which is much faster than geodetic functions. I am actually surprised by how slow the great-circle is, I was expecting something competitive with the haversine.

Finally, the last interesting bit of information from that table: tweaking the haversine one way or the other is actually slower. Simply having a well-written implement ion gets optimised better under the hood. It’s even more true for the fast version, which may good on GPUs but lags behind on CPUs. As for the approximate form, I don’t think a 5% improvement is worth it.

I had a quick look at the perf report to see where we were spending time. Not surprisingly, we get the same distribution. Below is a summary of the data collected, stopping at internal functions. The way to read it is by comparing children’s time with their parents. E.g., if warthog::geo::exact_distance took 26.35% of the total running time and its parent, run_distance took 44.39%, it means exact_distance took 59.36% of the profiling time.

Note: the distance functions are so fast that running the profiling code only takes 40% of total execution time, the rest is parsing the data.

--44.39%--run_distance
    |
    |--26.35%--warthog::geo::exact_distance
    |    |--11.07%--__sincos
    |    |--6.63%--__atan2
    |    |--1.27%--__ieee754_asin_fma
    |    |--1.23%--__atan_fma
    |     --0.72%--__tan_fma
    |
    |--4.46%--warthog::geo::fast_haversine
    |    |--0.79%--__fma_fma3
    |    |--0.62%--fma@plt
    |     --0.50%--__ieee754_asin_fma
    |
    |--3.55%--warthog::geo::vincenty_distance
    |    |--2.09%--__sincos
    |     --0.99%--__atan_fma
    |
    |--2.87%--warthog::geo::great_circle_distance
    |    |--1.40%--__sincos
    |     --1.04%--__ieee754_acos_fma
    |
    |--2.13%--warthog::haversine_heuristic::h
    |    |--1.24%--__cos_fma
    |     --0.55%--__ieee754_asin_fma
    |
    |--1.82%--warthog::geo::haversine_approx
    |     --1.30%--__atan2
    |
     --1.72%--warthog::geo::haversine
          --0.62%--__cos_fma

**Figure 2:** Flamegraph of the callstack, the actual profiling takes less than half the time. (You can open the image to zoom in on callstacks.)

As a comparison, the Euclidean distance ran for 0.05% of the total time.

Now, let’s have a look at their precision. I did not manage to plot it because there are 58M datapoints. The five-number summary is below.

Function	mean	std	min	25%	50%	75%	max
spherical	-6.32e-04	1.68e-03	-4.97e-01	-7.59e-04	-6.29e-04	-4.94e-04	2.60e-01
great-circle	-6.31e-04	2.33e-03	-8.99e-01	-7.59e-04	-6.29e-04	-4.94e-04	1.00e+00
haversine	-6.28e-04	1.42e-03	-4.09e-01	-7.59e-04	-6.29e-04	-4.94e-04	3.68e-01
haversine-sin	-6.32e-04	1.68e-03	-4.97e-01	-7.59e-04	-6.29e-04	-4.94e-04	2.60e-01
fast-haversine	-6.32e-04	1.68e-03	-4.97e-01	-7.59e-04	-6.29e-04	-4.94e-04	2.60e-01
haversine-approx	-6.32e-04	1.68e-03	-4.97e-01	-7.59e-04	-6.29e-04	-4.94e-04	2.60e-01
vincenty	-6.32e-04	1.68e-03	-4.97e-01	-7.59e-04	-6.29e-04	-4.94e-04	2.60e-01
vincenty-exact	-9.39e-04	2.48e-03	-4.95e-01	-2.70e-03	-9.86e-04	6.53e-04	2.57e-01

Interestingly, the exact procedure seems to have the same results as the approximate ones. I think this is due to the format:

DIMACS coordinates are truncated to six significant digits then converted to int.
We need to convert them back to double again.
The distances are also encoded as integers with a 0.1 m resolution.

So we may be using functions with too high a precision to matter. Plus, all the rounding and converting may already be enough to shoot down the precision.

Conclusion

Right now, I feel like I want a better dataset to try the functions. The precision results are underwhelming and point to the format. However, we got some good insights on the speed of the different distance functions.

There is still one thing we do not take into account: elevation. A PhD candidate in my lab has very detailed maps which he uses to route electric vehicles. In this case, elevation matters. For example, electric vehicles can use regenerative breaking downhill. The longest path achievable on one charge depends on elevation changes.

Footnotes

An “oblate spheroid.” ↩
The Earth’s physical radius varies between 6,378 and 6,356 km, a 22 km difference. ↩
Which is unique if the points are not antipodal. ↩
But at most 0.5% with 64 bit floating points. ↩
I also followed their implementation. ↩
Although, from looking at the output, most modern compilers can inline power operations well. ↩
Or rather, exact in a single datum: WSG84; and missing some convergence/reciprocal checks. ↩
For example: 2 * EARTH_RADIUS. ↩

Profiling Distance Functions for Geodesic Search

Searching on a (irregular) sphere

Different distance functions

Spherical projected distance

Great-circle distance

Haversine distance

The Vincenty formula

Benchmarking

Test case

Using a database

Benchmarking

Conclusion

Footnotes