Coral Reef Dataset Pipeline

Facing climate change and severe coral reef bleaching, which causes coral reefs to die off, coral reef scientists have been embracing new technologies to help diagnose the overall health of those coral reefs. Satellite remote sensing imagery can measure sea surface temperature as early warning signals for coral bleaching. Underwater, teams of scientists record live coral cover, reef fish biomass and the diversity of coral and fish species. 

However, what has always been conventionally difficult to analyze is data that’s collected manually, in the field, by the scientists themselves. Combining remote sensing information with underwater monitoring surveys is critical for coral reef conservation. For example it’s used to identify climate ‘cool spots’ or refuges that can buy time for coral reefs in a warming climate, evaluate threats from overfishing, monitor pollution or unsustainable development, and track the effectiveness of conservation interventions such as marine protected areas (MPAs) or ridge to reef conservation. 

Over the last 3 years, the Wildlife Conservation Society, World Wildlife Fund and Sparkgeo have developed MERMAID – the first online/offline open-source platform for coral reef monitoring data. Currently, MERMAID has collected over 13,000 coral reef transects, collected from 1,500 sites in 12 countries, and has more than 700 registered users from academic, NGO and government sectors. 

But pairing MERMAID monitoring surveys with remote sensing datasets was difficult. This year, the team at Sparkgeo decided to build a data pipeline that centralizes the data of multiple covariate factors that impact coral reefs and integrates them with MERMAID indicators of coral reef health. The goal was to ingest various remote sensing datasets, normalize the data, centralize it in one place, and provide access to the data via an API to the public, for free. 

What is MERMAID?

MERMAID stands for a Marine Ecological Research Management AID. Coral reef scientists spend hundreds of hours submerged underwater studying the health of coral reef ecosystems around the world – using clipboards, pencils and waterproof paper to collect data. The data they collect by hand is then traditionally transferred from underwater paper over to Excel spreadsheets. This information quantifies reef health and function, such as the amount of living coral on a reef (% hard coral cover), reef fish productivity (biomass, kg/ha) and coral reef diversity (the # of hard coral and reef fish taxa). This information can be further disaggregated into more complex indicators, such as the micronutrients available to coastal fisheries, the functioning of coral reef food web pyramids, or the susceptibility of coral communities to climate change and bleaching.


MERMAID modernizes the process of manually transferring recorded data into spreadsheets, and instead transfers it into a web application that standardizes the inputs, normalizes the data and stores it in the cloud.

It dramatically simplifies the painstaking job of recording coral reef data and cleaning and checking it for error, and allows scientists to scale up their information and collaborate globally to find new strategies and solutions for coral reef conservation. More importantly, their open source web application has made this data open and readily accessible to all coral reef scientists, managers, and governments to inform international policy commitments to save coral reefs.


Bringing in all the data with the MERMAID Pipeline

A large barrier to coral reef scientists is their ability to integrate datasets from remote sensing and underwater monitoring methods. A normalized, and centralized database of coral reef environmental, social, and climate covariates greatly reduces these barriers to accelerate coral reef science. 

Remote sensing datasets help quantify the exogenous factors that impact coral reef ecosystems, such as fishing pressure, water pollution, tourism, or coastal and industrial development. While there are organizations and scientists who develop and track these metrics regularly, this data remained siloed and was time consuming to integrate with underwater monitoring data to build accurate models to track and predict coral reef health. 

The goal of the MERMAID covariates pipeline is to create an automated system that is able to regularly fetch and normalize remote sensing datasets from various sources.  These normalized datasets will be publicly accessible via a MERMAID covariate API allowing MERMAID users to query and visualize these datasets with their underwater monitoring data.

For the first phase of the MERMAID pipeline project, the following datasets have been identified for collection:

coral reef habitat and geomorphology: Allen Coral Atlas

annual composite degree heating weeks (DHW)

fishing pressure: market gravity

water pollution: nutrients

daily sea surface temperature (SST)

human population (number of people) within 10, 50, 100, 250, 500 km

fishing pressure: total gravity

water pollution: sedimentation

The MERMAID Pipeline is still under development and will be made available in the near future.

Saving the world’s coral reefs – faster

Data is critical to inform evidence-based decision making to tackle top threats, protect climate refuges, and ensure conservation and management meets both social and ecological pillars of sustainability. By pairing MERMAID and remote sensing datasets in a reproducible and automated API pipeline, Sparkgeo is helping coral reef scientists have fast and easy access to the datasets they need, on demand. This provides hope not only for coral reef scientists, but most importantly for coral reefs as 90% of coral reefs are threatened by 2050.