Blather, Rinse, Repeat: October 2019

Sunday, October 13, 2019

Python to Google Sheets to Unity to JSON resource to runtime - nothing but net

I just uploaded a Unity app at http://bigdicegames.com/CWGSparks/index.html that draws this picture:

Which, if you've been following along, and are a little forgiving with my projection, could look like a highway map of the US. It's incomplete, and it's low res - the only vertices are at cities, and I have just shy of 500 cities in my database. Also, I've been manually connecting up cities based on my interest in them, so there's a lot more cities in Texas, Florida, and California, and a lot more highways that will someday be coming in to this picture.

Previous posts have talked about doing pathfinding from city to city, and I've been using Python for that part of the project. To help me visualise what's going on, I use Reportlab to draw PDFs, which sometimes I print out.

This is all intended to be in service of a game. (A computer game? Maybe playable on the web? I also have ideas of a choose-your-own-branching path gamebook, but that doesn't require as much pathfinding.) So, it makes sense to start porting the data into Unity.

In talking to my friends about this project, I talk about having a highway database. That's clearly putting on airs, as really what I have is two Python arrays of tuples. One array is the city array, and each line in the file has a city name and lat/lon coordinates. The other array is the road array, which links up cities by name. Simple as that.

Well, yeah, but a little gross. There's no error checking, and it's super easy to get duplicates in there. Greeley, Colorado was in my "database" twice, with two different spellings. That seems like it's cleaned up now.

Python's great, but it's not the best data repository. I've been kind of itching to move my data to a Google Sheets spreadsheet. This ties in with some desire I have to try to embrace a data-driven, "Entity/Component/System" (ECS) style for a personal project of some size.

Aside: http://bigdicegames.com/ECSteroids/index.html is a small game I made, trying to play around with ECS maybe a year and a half, two years ago. arrow keys steer, CTRL shoots.

So, anyway, I wanted to port my Python-based data to Google Sheets. That's super easy, I just wrote out a CSV file by hand for each of my tables. If I actually had to think about commas inside my data values, I'd probably go so far as to import csv from Python and use a writer provided. But no time for that.

So, I imported my CSV into a Google Sheets, um, sheet. Looks great, all my data is there, and even some fancier stuff, where I generate a unique key for city and roads, as if I was using a real database.

For what it's worth, a lot of those cities were scraped from Wikipedia's article on most populous cities in the US. So that got me started, but I've been adding extra cities to give my roads useful places to turn. The original Wikipedia article had 314 cities, and I've been adding extra places so that now the spreadsheet has 498 lines. Anchorage, Alaska and Honolulu, Hawaii are problematic, but I'll get to that later.

So, the data's in Google's hands now. Fine. I want to be able to use it inside Unity. I poked around a number of similar tools, and ended up using Google Sheets for Unity Lite, which does the job I need right now. There's a non-lite version on itch.io, but it's unclear to me what the value added is.

Watching and rewatching A YouTube video where a guy is using GSFUL to import data, I managed to pull the data out of Google Sheets, and down into JSON data in my app. Cool, so far.

Except I don't really want the users to be hammering Google Sheets to pull data at runtime. I'd much prefer to bundle that data into the app itself.

Unity has several ways to make this sort of thing work, but what I chose to do is that my Google Sheets pull happens as an editor extension, pulling the data as JSON, and then I write it into my Assets/Resources/JSON folder. "Resources" are data that get bundled up inside the app through Unity magic.

A little dead end banging around, trying to read from the Resources folder directly, I ended up using Resources.Load<TextAsset>, like so:

And the JSON is now in a CityArray and RoadArray object, each of which is a simple (if annoying) wrapper around the array of JSON objects for cities and roads. Boom.

At that point, I just used Vectrosity to draw the lines. There was a little bit of math to jam the map onto the screen rectangle, but that's just a bounding box and some scaling. Nothing fancy.

I sort of want to put some virtual cars on the roads driving from state to state. (No papers, Vasili!) That can come later.

Also, maybe a little more Texas, it looks lonely.

Wednesday, October 9, 2019

I agree with Google Maps on Cannonball Route

As a test of my contraction hierarchy code, and of an expanding database I'm building of US highways, I asked both Google Maps and my code to find a path from New York, NY to Los Angeles, CA.

Looks like Google recommends going through Indianapolis, to Oklahoma City, then due West.

My map isn't as extensive as Google Maps, but let's see what it decides to do:

New York, NY -> Indianapolis, IN -> Oklahoma City, OK, and on west to Los Angeles, CA.

Better than I expected. I'll take it!

Sunday, October 6, 2019

Contraction Hierarchies vs A*

You may recognize the above map as a crude highway map covering (parts of) California, Nevada, Utah, Wyoming, and Colorado.

You might even have heard of a project that I've got on my back burner, a branching, choose your own path, adventure book and/or computer game, which I might be calling "Sparks and Rusty" or "Sparks and the Wheelman" or some other goofy pairing that evokes 1980s action road adventures.

In this adventure, you have a semi tractor, and are asked to use it to tow a semi trailer from Mountain View, CA to Colorado Springs, CO. The interesting (hopefully) bit is that the semi trailer houses a supercomputer, on which is running a super-intelligent, sentient AI. That's "Sparks".

So, yeah, get from point A to point B, with your newfound buddy / damsel, across presumably hostile territory.

Also, as this is a Cars With Guns joint, the hostile territory will have lots of violent folks on the highways. Murderous cycle gangs. Folks with machine guns mounted on the hood of their cars. Ambushes. Gorse bushes. Perhaps not the last one.

If you're familiar with the Steve Jackson Games gamebook "Convoy", you're thinking along the right tracks - it's an inspiration, to be sure.

So, you've got a map, and you've got a destination. It should be easy to pathfind to the destination, right?

Well, sure. Let's first of all get out of the way that my map is fairly small - 70 cities, some of which I added, just to make the highways bend in approximately the right places (I'm looking at you, Muddy Gap, WY). So, it's not too hard to exhaustively search that, Dijkstra is actually pretty good for this sort of thing.

Also, let's concede that my city-to-city distance is an approximation, using pythagoras and a mercator projection onto a flat plane. Real highways are wigglier, and this is coming up on big enough to actually care about the curvature of the Earth. All of that can be refined later, if I need to.

So hey, A* (pronounced "ay-star"), that's a thing, too, right? Yep, sure is. Can be faster than Dijkstra, and is fewer characters, though it might be problematic as a filename, or a class name, depending on your OS and your language.

So, I wrote a quick little A* implementation - I've got "AI Engineer" on my resume for a reason, I should be able to knock this out in a page or less of Python, right?

Turns out, sure, something like that. And A* tells me that the way to navigate my map to get from Mountain View to Colorado Springs is:

['Mountain View, CA', 'Pleasanton, CA', 'Walnut Creek, CA', 'Vacaville, CA', 'Sacramento, CA', 'Placerville, CA', 'Reno, NV', 'Ely, NV', 'Cove Fort, UT', 'Green River, UT', 'Grand Junction, CO', 'Rifle, CO', 'Lawson, CO', 'Denver, CO', 'Colorado Springs, CO']

Which is maybe not what I would have chosen, if I was just wandering the highways of the Western US without a map by my side, I might have gone through Reno, staying on I-80 through West Endover, and turn right at Fort Collins, CO, and down to Colorado Springs. But I trust this is shorter. It doesn't take cycle gangs or refuelling into account, which players will need to think about.

Sidebar anecdote: years ago, my father asked me how Google Maps can do continent-scale navigation so quickly. I shrugged and said I didn't have access to the Google Maps source code, but I imagined that if somebody asked Google Maps for turn-by-turn navigation from, let's say Bremerton, WA to Boston, MA, Google Maps would be smart enough to know that you can get from Seattle, WA to Boston, MA, by taking I-90 (pretty much) without turns, and so that's a subproblem that can be "shortcut". Not saying that you're taking a shorter physical path, but not considering all segments of I-90 incrementally; that's thousands of potential exits off the interstate that you might not care about.

And, in fact, you actually can do a little better than saying on I-90, if for no better reason than to detour around Chicago and Minneapolis. At least, that's what Google Maps says right now. So, it has some knowledge of times when the interstate isn't the best answer.

I've taken some online courses to keep my brain full, and one of the AI courses I took had a short section on highway navigation, with a visiting guest from Microsoft/Bing Maps, talking about "Contraction Hierarchies" / Node Labels.

The short version of the "Node Labels" technology is that you store at every location a "label", which I'll just call a "dictionary" of intermediate locations, along with the shortest distance to that intermediate location. The amazing bit is that this dictionary ends up averaging something like the log of the number of points in your map. So, I look up Bremerton, and find a bunch of cities including St Louis, New Orleans, and Chicago. I look up Boston, and find a different bunch of cities, including New Orleans, Memphis, Miami, and Chicago. And so I find the intersection of these dictionaries, and find that I can route from Bremerton to Boston via Chicago in 3096 miles, or Bremerton to New Orleans to Boston in 4250 miles. I'll take Chicago today, thank you.

If you've got these labels preprocessed for a bunch of locations, you can do this midpoint thing very fast.

I'm a little fuzzy on what the Microsoft guy said (if he did) about how you turn this in to navigation - maybe he was just presenting a "distance oracle", which is still something.

Maybe you recursively do this, and find a midpoint between Bremerton and Chicago, and so on and so on. I'd be concerned, though - in my example, Chicago is in my "label" for Bremerton, so there might be edge-case issues going on, where I can't get a good divide-and-conquer solution.

Or, maybe inside your label, you store not only distances, but a path. That's a lot more data.

Or, instead of storing all of this label information, you store some small information in the cities on your map that helps you reconstruct the label quickly. That seems good, especially since I'm not actually in the business of doing server-side computation of this stuff for lots of queries per second.

And it's this small information approach that Contraction Hierarchies uses to give you a solution.

Imagine I assign each city in my map a unique integer index. Could be assigned randomly, could be alphabetical, it doesn't matter at this point how I come up with these indices. Now, in that order, I'm going to start simplifying my map, removing one city at a time, which should make things incrementally simpler. But I still want the remaining map to contain correct information about shortest paths. This is where "shortcuts" come in. Again, shortening the computation work, not shortening the physical, map, distance.

So, let's say I decide that I'm looking at my map, and I decide to simplify it to remove Bozeman, MT.
May map says that I can go from Belgrade, MT to Bozeman, MT, and from Bozeman, MT to Livingston, MT. I would add in a shortcut from Belgrade to Livingston, and add a note on the shortcut saying "also, to take this shortcut, you'll pass through Bozeman". And we do this for each city in our graph, adding shortcuts where the city is required for shortest path calculations to remain correct.

So, we make a new graph, which we'll call G* which has all the original cities and all the original edges, but also these new shortcuts. Depending on the connectedness of your graph, a lot of cities can go away and not have any shortcuts at all.

The neat trick at this point is to navigate G* by only going "up" in indices, including using our new shortcuts. So, for a known good path from Bremerton to Boston, we will have contracted the various cities along the way in some order, which you can visualize as a jagged mountain range - each contraction yields a shortcut, allowing us to "fill in" a little valley between neighboring points (that are later in the contraction order, so therefore are "higher" on our mountain visualization). So, the points late in the contraction order become "hubs" that traffic want to go through.

So, we figure out these best hubs, reachable from our start location and our destination location, each going up. In the Bremerton and Boston example, we can imagine that Chicago is a good hub (O'Hare is maybe not a great experience for air travellers, but as a highway crossroads, it's useful), contracted late in the preprocessing, and so "up" from both Bremerton and Boston.

And the path to get from Bremerton to Chicago has a few shortcuts along the way, probably including when we contracted Bozeman, so when we come to unroll our directions, we replace shortcuts with paths through their contracted cities.

And that's just about it.

So, I wrote a Contraction Hierarchy solution for my game map (Mountain View to Colorado Springs, if you recall). I put in Mountain View, and asked it where I could get to going "up" my graph, and it said:

St. George, UT
Walnut Creek, CA
Bakersfield, CA
Modesto, CA
Salinas, CA
Vacaville, CA
Paso Robles, CA
Fort Collins, CO
Mountain View, CA
Reno, NV
Pleasanton, CA
Tonopah, NV

I asked it where I could get to by going "up" from Colorado Springs, and the list was:
Denver, CO
Cove Fort, UT
Pueblo, CO
Colorado Springs, CO
Walsenburg, CO
Rock Springs, WY
Salt Lake City, UT
Fort Collins, CO
Reno, NV
St. George, UT
Tonopah, NV

For each of these destinations, I got the city, a real highway distance, and a path with shortcuts to get to that city.

So, I found the places I could get to from my start and from my destination, which narrowed things down to:

(1092.933032328154, 'Reno, NV')
(1187.1179961380471, 'St. George, UT')
(1329.8134461278964, 'Fort Collins, CO')
(1570.1329202911515, 'Tonopah, NV')

That's the cities along with their combined distance (start to city to destination), so it looks like our travellers are going through Reno. My two graph searches give me shortcutted paths, which combine to look like:

['Mountain View, CA', 'Pleasanton, CA', 'Walnut Creek, CA', 'Vacaville, CA', 'Reno, NV', 'Cove Fort, UT', 'Denver, CO', 'Colorado Springs, CO']

which seems about right, though Vacaville to Denver is a pretty short distance on the page, and a pretty big chunk of our travel distance. That's the shortcuts, so let's unwrap those:

['Mountain View, CA', 'Pleasanton, CA', 'Walnut Creek, CA', 'Vacaville, CA', 'Sacramento, CA', 'Placerville, CA', 'Reno, NV', 'Ely, NV', 'Cove Fort, UT', 'Green River, UT', 'Grand Junction, CO', 'Rifle, CO', 'Lawson, CO', 'Denver, CO', 'Colorado Springs, CO']

Which is the same solution that we got from A*. This is reassuring. If they were different, I'd go back and try to figure out what's going on. I can imagine that if we were trying to find shortest paths across Manhattan, there might be many routes with similar (map) distances, and in that case, maybe A* and CH would give different answers. One thing about A* is that there's a heuristic function that is used to prioritize expanding paths based on expected distance remaining, and if I got that wrong, I could see A* giving a slightly wrong answer.

But they're the same, and they're both lightning fast on my map of 70 cities. I would expect that if I had a lot bigger of a map, I might start caring, but for my game, it probably doesn't matter.

One thing that might make me go for A* in my game is that my map might be dynamic enough that I won't want to redo the processing - let's say our heroes hear that a bridge is out, and the road from Reno to Rachel, NV is blocked. (Or, maybe there's some other reason why the U. S. Army is rerouting traffic around Rachel?) In that case, using A* on the (dynamic) map data might be more convenient.

Spoiler: it's probably aliens at Area 51. Maybe some got out, maybe the army is bringing new ones in, maybe there's a shipment of alien materials that overturned. I've calculated a path from Rachel, NV, to Devil's Tower, WY, just in case.

This stuff isn't super hard, but it does take a little work to get right. I watched (and re-watched, and read and reread the slides from) a German AI course on navigation:

http://ad-wiki.informatik.uni-freiburg.de/teaching/EfficientRoutePlanningSS2012

Particularly, lectures 6 and 7.

So, Dad, if I wanted to go from Bremerton to Boston, or vice versa, I'd use shortcuts.