Sparkgeo @ STAC Sprint #8

philadelphia, skyscraper, skyline-4643451.jpg

STAC Sprint #8 took place last week at Element 84’s office in Philadelphia. This sprint focused on identifying and rectifying some STAC edge case issues, creating tutorials and educational materials, driving STAC and STAC API extensions to updated versions, and a healthy discussion about using vector data in the STAC ecosystem.

James Banting presenting at STAC sprint

Bands RFC

Matthias Mohr identified a discrepancy in how STAC references data in multi-dimensional datasets. Different disciplines use different terminology to talk about the same issue. Terms like “Bands,” “Channels,” and “Layers” are used interchangeably when referencing discrete slices of data in multi-dimensional datasets. The discussion centred around reconciling these terms in the STAC specification and extensions. Initially, eo:bands was introduced to describe the spectral characteristics of a band. However, this approach failed to capture other datasets with multi-dimensions, like climate data. As a result, raster:bands was proposed to cater to raster bands or layers in general.

The result of extensive discussions led to ‘bands’ being a new field in common metadata. This replaces eo:bands and raster:bands. The fields data_type, nodata, statistics, and unit have also been added to common metadata.

OGC Alignment

At the same time the STAC sprint was happening, OGC members were meeting in Singapore for their annual conference. Both events over the same week provided ample reason to discuss the legacy of STAC and its connection to the OGC API suite of standards. STAC and STAC-API were presented to OGC members as Community Standards for OGC adoption. STAC is based on OGC API – Features and is implemented by large and small corporations and governments. Having OGC endorse STAC means its membership can more easily follow FAIR data practices and align with the needs of the broader geospatial community.

Janitorial Work

STAC has a healthy ecosystem of extensions created and maintained by community members. While this builds on the tenets of STAC, namely extensibility, it also means that some extensions need the proper care required for other groups to use that extension comfortably. To that end, many extensions were updated to a new version, and their compatibility with STAC and community practices were addressed.

PySTAC

PySTAC received a facelift to ensure backward compatibility with extensions and make the interface for extensions simpler for developers to work with. Along the same vein of making libraries easier to work with for developers, there’s an RFC to deprecate STAC-pydantic. The rationale behind the deprecation is that PySTAC already serves developers’ needs—and more. Reducing the number of libraries that perform the same task minimizes the load on developers implementing STAC.

STAC GeoParquet

Optimizing STAC-GeoParquet conversion saw a significant leap, especially with adopting newline-delimited JSON STAC input. This shift aligns well with the pre-sprint consensus of embracing Parquet structures instead of merely embedding JSON. Major challenges included schema resolution, especially when dealing with massive STAC collections. The importance of ensuring a single schema across all items for seamless Parquet file writing became more apparent.

Earlier discussions around user-centric exports from STAC-GeoParquet resonated well during the sprint. Pandas presented a noticeable hurdle in speeding up the round-trip process of converting GeoJSON to Parquet and back. Apache Arrow and pyarrow emerged as the best solutions for working with STAC metadata and Parquet.

STAC Education

A specification without adoption doesn’t get very far, and one of the main goals of STAC is to make geospatial data accessible and interoperable. New educational materials were created during the sprint, focusing on the R programming language and developing a STAC FAQ. The sprint achieved many of the initial proposed goals and developed new material describing STAC. We are nearing the release of STAC 1.1, with increased adoption and contributions from new experts.

Image of STAC attendees at dinner.
STAC Sprint Attendees at dinner.

Thanks to Element 84 for graciously hosting this event. Cloud-Native Geo for bringing us all together to discuss and collaborate, and our sponsors —Microsoft, Hydrosat, Terradue, and Sparkgeo—for their invaluable support. Thank you all for making this event possible.