Blather, Rinse, Repeat: Contraction Hierarchies vs A*

You may recognize the above map as a crude highway map covering (parts of) California, Nevada, Utah, Wyoming, and Colorado.

You might even have heard of a project that I've got on my back burner, a branching, choose your own path, adventure book and/or computer game, which I might be calling "Sparks and Rusty" or "Sparks and the Wheelman" or some other goofy pairing that evokes 1980s action road adventures.

In this adventure, you have a semi tractor, and are asked to use it to tow a semi trailer from Mountain View, CA to Colorado Springs, CO. The interesting (hopefully) bit is that the semi trailer houses a supercomputer, on which is running a super-intelligent, sentient AI. That's "Sparks".

So, yeah, get from point A to point B, with your newfound buddy / damsel, across presumably hostile territory.

Also, as this is a Cars With Guns joint, the hostile territory will have lots of violent folks on the highways. Murderous cycle gangs. Folks with machine guns mounted on the hood of their cars. Ambushes. Gorse bushes. Perhaps not the last one.

If you're familiar with the Steve Jackson Games gamebook "Convoy", you're thinking along the right tracks - it's an inspiration, to be sure.

So, you've got a map, and you've got a destination. It should be easy to pathfind to the destination, right?

Well, sure. Let's first of all get out of the way that my map is fairly small - 70 cities, some of which I added, just to make the highways bend in approximately the right places (I'm looking at you, Muddy Gap, WY). So, it's not too hard to exhaustively search that, Dijkstra is actually pretty good for this sort of thing.

Also, let's concede that my city-to-city distance is an approximation, using pythagoras and a mercator projection onto a flat plane. Real highways are wigglier, and this is coming up on big enough to actually care about the curvature of the Earth. All of that can be refined later, if I need to.

So hey, A* (pronounced "ay-star"), that's a thing, too, right? Yep, sure is. Can be faster than Dijkstra, and is fewer characters, though it might be problematic as a filename, or a class name, depending on your OS and your language.

So, I wrote a quick little A* implementation - I've got "AI Engineer" on my resume for a reason, I should be able to knock this out in a page or less of Python, right?

Turns out, sure, something like that. And A* tells me that the way to navigate my map to get from Mountain View to Colorado Springs is:

['Mountain View, CA', 'Pleasanton, CA', 'Walnut Creek, CA', 'Vacaville, CA', 'Sacramento, CA', 'Placerville, CA', 'Reno, NV', 'Ely, NV', 'Cove Fort, UT', 'Green River, UT', 'Grand Junction, CO', 'Rifle, CO', 'Lawson, CO', 'Denver, CO', 'Colorado Springs, CO']

Which is maybe not what I would have chosen, if I was just wandering the highways of the Western US without a map by my side, I might have gone through Reno, staying on I-80 through West Endover, and turn right at Fort Collins, CO, and down to Colorado Springs. But I trust this is shorter. It doesn't take cycle gangs or refuelling into account, which players will need to think about.

Sidebar anecdote: years ago, my father asked me how Google Maps can do continent-scale navigation so quickly. I shrugged and said I didn't have access to the Google Maps source code, but I imagined that if somebody asked Google Maps for turn-by-turn navigation from, let's say Bremerton, WA to Boston, MA, Google Maps would be smart enough to know that you can get from Seattle, WA to Boston, MA, by taking I-90 (pretty much) without turns, and so that's a subproblem that can be "shortcut". Not saying that you're taking a shorter physical path, but not considering all segments of I-90 incrementally; that's thousands of potential exits off the interstate that you might not care about.

And, in fact, you actually can do a little better than saying on I-90, if for no better reason than to detour around Chicago and Minneapolis. At least, that's what Google Maps says right now. So, it has some knowledge of times when the interstate isn't the best answer.

I've taken some online courses to keep my brain full, and one of the AI courses I took had a short section on highway navigation, with a visiting guest from Microsoft/Bing Maps, talking about "Contraction Hierarchies" / Node Labels.

The short version of the "Node Labels" technology is that you store at every location a "label", which I'll just call a "dictionary" of intermediate locations, along with the shortest distance to that intermediate location. The amazing bit is that this dictionary ends up averaging something like the log of the number of points in your map. So, I look up Bremerton, and find a bunch of cities including St Louis, New Orleans, and Chicago. I look up Boston, and find a different bunch of cities, including New Orleans, Memphis, Miami, and Chicago. And so I find the intersection of these dictionaries, and find that I can route from Bremerton to Boston via Chicago in 3096 miles, or Bremerton to New Orleans to Boston in 4250 miles. I'll take Chicago today, thank you.

If you've got these labels preprocessed for a bunch of locations, you can do this midpoint thing very fast.

I'm a little fuzzy on what the Microsoft guy said (if he did) about how you turn this in to navigation - maybe he was just presenting a "distance oracle", which is still something.

Maybe you recursively do this, and find a midpoint between Bremerton and Chicago, and so on and so on. I'd be concerned, though - in my example, Chicago is in my "label" for Bremerton, so there might be edge-case issues going on, where I can't get a good divide-and-conquer solution.

Or, maybe inside your label, you store not only distances, but a path. That's a lot more data.

Or, instead of storing all of this label information, you store some small information in the cities on your map that helps you reconstruct the label quickly. That seems good, especially since I'm not actually in the business of doing server-side computation of this stuff for lots of queries per second.

And it's this small information approach that Contraction Hierarchies uses to give you a solution.

Imagine I assign each city in my map a unique integer index. Could be assigned randomly, could be alphabetical, it doesn't matter at this point how I come up with these indices. Now, in that order, I'm going to start simplifying my map, removing one city at a time, which should make things incrementally simpler. But I still want the remaining map to contain correct information about shortest paths. This is where "shortcuts" come in. Again, shortening the computation work, not shortening the physical, map, distance.

So, let's say I decide that I'm looking at my map, and I decide to simplify it to remove Bozeman, MT.
May map says that I can go from Belgrade, MT to Bozeman, MT, and from Bozeman, MT to Livingston, MT. I would add in a shortcut from Belgrade to Livingston, and add a note on the shortcut saying "also, to take this shortcut, you'll pass through Bozeman". And we do this for each city in our graph, adding shortcuts where the city is required for shortest path calculations to remain correct.

So, we make a new graph, which we'll call G* which has all the original cities and all the original edges, but also these new shortcuts. Depending on the connectedness of your graph, a lot of cities can go away and not have any shortcuts at all.

The neat trick at this point is to navigate G* by only going "up" in indices, including using our new shortcuts. So, for a known good path from Bremerton to Boston, we will have contracted the various cities along the way in some order, which you can visualize as a jagged mountain range - each contraction yields a shortcut, allowing us to "fill in" a little valley between neighboring points (that are later in the contraction order, so therefore are "higher" on our mountain visualization). So, the points late in the contraction order become "hubs" that traffic want to go through.

So, we figure out these best hubs, reachable from our start location and our destination location, each going up. In the Bremerton and Boston example, we can imagine that Chicago is a good hub (O'Hare is maybe not a great experience for air travellers, but as a highway crossroads, it's useful), contracted late in the preprocessing, and so "up" from both Bremerton and Boston.

And the path to get from Bremerton to Chicago has a few shortcuts along the way, probably including when we contracted Bozeman, so when we come to unroll our directions, we replace shortcuts with paths through their contracted cities.

And that's just about it.

So, I wrote a Contraction Hierarchy solution for my game map (Mountain View to Colorado Springs, if you recall). I put in Mountain View, and asked it where I could get to going "up" my graph, and it said:

St. George, UT
Walnut Creek, CA
Bakersfield, CA
Modesto, CA
Salinas, CA
Vacaville, CA
Paso Robles, CA
Fort Collins, CO
Mountain View, CA
Reno, NV
Pleasanton, CA
Tonopah, NV

I asked it where I could get to by going "up" from Colorado Springs, and the list was:
Denver, CO
Cove Fort, UT
Pueblo, CO
Colorado Springs, CO
Walsenburg, CO
Rock Springs, WY
Salt Lake City, UT
Fort Collins, CO
Reno, NV
St. George, UT
Tonopah, NV

For each of these destinations, I got the city, a real highway distance, and a path with shortcuts to get to that city.

So, I found the places I could get to from my start and from my destination, which narrowed things down to:

(1092.933032328154, 'Reno, NV')
(1187.1179961380471, 'St. George, UT')
(1329.8134461278964, 'Fort Collins, CO')
(1570.1329202911515, 'Tonopah, NV')

That's the cities along with their combined distance (start to city to destination), so it looks like our travellers are going through Reno. My two graph searches give me shortcutted paths, which combine to look like:

['Mountain View, CA', 'Pleasanton, CA', 'Walnut Creek, CA', 'Vacaville, CA', 'Reno, NV', 'Cove Fort, UT', 'Denver, CO', 'Colorado Springs, CO']

which seems about right, though Vacaville to Denver is a pretty short distance on the page, and a pretty big chunk of our travel distance. That's the shortcuts, so let's unwrap those:

['Mountain View, CA', 'Pleasanton, CA', 'Walnut Creek, CA', 'Vacaville, CA', 'Sacramento, CA', 'Placerville, CA', 'Reno, NV', 'Ely, NV', 'Cove Fort, UT', 'Green River, UT', 'Grand Junction, CO', 'Rifle, CO', 'Lawson, CO', 'Denver, CO', 'Colorado Springs, CO']

Which is the same solution that we got from A*. This is reassuring. If they were different, I'd go back and try to figure out what's going on. I can imagine that if we were trying to find shortest paths across Manhattan, there might be many routes with similar (map) distances, and in that case, maybe A* and CH would give different answers. One thing about A* is that there's a heuristic function that is used to prioritize expanding paths based on expected distance remaining, and if I got that wrong, I could see A* giving a slightly wrong answer.

But they're the same, and they're both lightning fast on my map of 70 cities. I would expect that if I had a lot bigger of a map, I might start caring, but for my game, it probably doesn't matter.

One thing that might make me go for A* in my game is that my map might be dynamic enough that I won't want to redo the processing - let's say our heroes hear that a bridge is out, and the road from Reno to Rachel, NV is blocked. (Or, maybe there's some other reason why the U. S. Army is rerouting traffic around Rachel?) In that case, using A* on the (dynamic) map data might be more convenient.

Spoiler: it's probably aliens at Area 51. Maybe some got out, maybe the army is bringing new ones in, maybe there's a shipment of alien materials that overturned. I've calculated a path from Rachel, NV, to Devil's Tower, WY, just in case.

This stuff isn't super hard, but it does take a little work to get right. I watched (and re-watched, and read and reread the slides from) a German AI course on navigation:

http://ad-wiki.informatik.uni-freiburg.de/teaching/EfficientRoutePlanningSS2012

Particularly, lectures 6 and 7.

So, Dad, if I wanted to go from Bremerton to Boston, or vice versa, I'd use shortcuts.

1 comment:

BigDiceDaveOctober 6, 2019 at 12:09 PM
Postscript to the above, or intended sidebar - during a (games) job interview, one of the people on the other side of the desk asked me how Google Maps did continent-level pathfinding. I honestly answered that I didn't know, nor could I share the algorithms if I did know. I vaguely described Contraction Hierarchies, which was perhaps too vague for the interviewer. He said "So, it's a hierarchical pathfinding solution". Erm, sort of. But not really. There's a lot of edge cases in hierarchical pathfinding. This is hierarchical at the granularity of individual cities, which smoothes stuff out and you get less weirdness.

I got that job, and didn't end up using CH or hierarchical pathfinding; A* was a really good fit for what we needed to do.

Sunday, October 6, 2019

Contraction Hierarchies vs A*

1 comment: