Git for the World

Will Cadell August 13, 2016

I had another good long discussion with @bmann this morning. We talked about automated vehicles, augmented reality and the implications on the gesopatial industry. Which, FYI, will be profound.

A few minutes later he tweets this:

That guy… Anyway.

Geo Git is far from being a “new thing”. The concept being that of applying the git code versioning process to geospatial data has been realised as Geo Gig and been sheperded by Boundless to the point it is at now. I am not going to comment on the state of the project. I honestly don’t know. But I will comment on the concept. This idea will become a key supporting structure of our future augmented world.

Take a moment and consider. After Pokemon, there will be more augmented applications, connected vehicles are a great example. There have been tragic Pokemon events, but considering the number of users (peaking at 25 million daily active users), the number of problems appears to be small. Obviously every tragic event is hugely distressing to all concerned, but imagine if the augmented application was driving your vehicle at 100kph (yeah, that’s in Canadian). Lots could go wrong, much faster.

Augmented experiences need to be based on real-life data. They need to know where the streets are, where the medians are, where the stop lines are. They will need to know a great deal more than that. Now, of course the car is also a sensor, but the base data needs to also exist pre live capture; in the car example, we have stopping distances to consider. Detecting a stop line 10 meters ahead when travelling at 100kph is a bad scene. Data points like that need to be known in advance of the first vehicle detecting the stop line. I have talked about data’s influence on AR recently, here I want to consider further the temporal nature of geospatial.

Data, my good friends, does not remain stationary. Like the world around us, data changes. It’s a simple thing to consider: data is a representation of our habitat, our habitat is always changing so data representing it should always change too.

Conceptually the solution might also appear simple. We just apply any changes to our base datasets as we go. Hell we could even pull in new snapshots of a data product. Well yes, and no. The thing about data is that it has personality; Open data in particular is opinionated and messy; data is very human, somewhat reflective of the organisation that created it. This means that we are left with a literal Tower of Babel. Not only are there many different shapes and sizes of data products, but each product can also be of a significant size and somewhat dirty in and of itself. Indeed, we now have alternative sources like hyper-temporal remote sensing to further support this live firehose of change.

As such, geospatial data management goes from being expert to insane mode very quickly. This is precisely the reason that Google Maps etc will never be done and that Uber continues to invest in geospatial.

We need to be looking hard at technologies like geo-gig again. If we can apply a git style methodlogy to our geospatial data we will be very much closer to some kind of reasonable, live, vector mapping fabric. Again, the stickler will be adoption. Custodians of those data products which are becoming so valuable, will themselves have to see value in the adoption of a common schema and a common API.

A common schema because generalisation must and will happen. A road in one US county is much the same as a road in another US county except, if one was to just look at the data, they appear to be quite different.

An API because data updates need to trigger external events. Unless we are to see a literal doubling of web traffic caused by bots testing for new public data products, the custodians of those modern open data products will need to have an API of change. A trigger indicating to the community that there is new or changed data to harvest. We absolutely need to move away from the idea of “quarterly releases” and towards the idea of a persistantly live fabric where small updates might be published daily or hourly.

So, our augmented future will change how we look at open geospatial data. Indeed, we will have to start thinking very hard about how temporal data management can be reasonably achieved.

Before you say it can’t be done, pause. You know it will be, it’s just a matter of time. Everything is impossible, right up to the day someone does it.

Image: Manuel Cosentino