Blogs

Deploying up to 10.000 parallel containers with AWS CDK and AWS Batch to quickly process satellite images and help farmers innovate toward sustainable food production.

Written by Daniel Koops | Nov 15, 2024 12:25:35 PM

eLEAF, an Amersfoort based company, is one the few companies able to process satellite data for agriculture, water and climate risk management on a global scale. 

Their satellite-based solutions support customers worldwide to use water sustainably, to increase food production and to protect farmers worldwide against climate shocks. 

To decide whether extra water spraying is necessary, farmers pay close attention to the weather forecast and constantly assess the humidity of their land.

To help farmers decide where and when to spray water, and prevent unnecessary spraying, eLEAF offers highly accurate satellite data on EvapoTranspiration (ET): the sum of all processes by which water moves from the land surface to the atmosphere via evaporation and transpiration. 

Two Sentinel satellites provide images of our planet every five days, which are continuously processed to update ET-information. 
Every picture is a tile of 100 x 100km, with a resolution up to 10 meters. It is enriched with metadata to create ‘TileDates’, to see ET-development over time.

Image based solutions generate an immense amount of data. The success of eLEAF thus led to high costs for infrastructure and data management. 

To keep associated cost under control and to ensure high performance for their customers, eLEAF asked Itility for help.

The Itility Data Factory was the perfect answer; a flexible, highly scalable, and cost-efficient solution to managing data. 
This architecture framework contains Big Tech solution blocks to support large-scale processes and process enormous amounts of data. It is part of the Itility IT Foundation.

 

Redesigning for efficiency

Satellite pictures become valuable, by enriching them with other data types such as meteorological data. 
This enriching used to be a step-by-step process, managed by an orchestrator which continuously checked the status of several millions of separate jobs. This made the process slow, costly, and complex to scale.

Therefore, our first step was to redesign the data processing pipeline and to restructure existing code.

In the new process, we take advantage of the scalability of AWS Batch as container platform. Each container runs identical code to process exactly one tile.
Every time a new cycle starts, we deploy up to 10.000 parallel containers that each pick up a picture and process it. 
As soon as the cycle is complete, the containers are immediately shut down to prevent cost. 

The platform uses AWS Graviton CPU’s with spot instances for processing jobs at significantly lower cost. The downside is that spot instances are not guaranteed, so they can stop working at any time. When spot instances are too constrained, AWS Step Functions automatically restarts a new container with on-demand compute. 

 

Professionally managed

To manage access and handle the passwords properly, we used AWS Secrets Manager. 

A GitLab repository and CI/CD pipeline are used to manage the code effectively, ensuring it is automatically tested and deployed. This promotes a structured approach to code development, and it reduces the developers’ workload. 

To monitor the process quality, all containers send logs to a central monitoring environment in CloudWatch, which we enhanced with Grafana because CloudWatch dashboards lack the interactivity we needed.  

The eLEAF front-end application, FruitLook, is monitored with AWS Managed Grafana which includes hosting and authentication.

 

High-quality information at lower cost

After migrating to our Data Factory solution, the recurring cost of high-quality ET information was reduced by 50%. Making ET information affordable for farmers worldwide.