Mapping the environmental footprints of crop production through machine learning

Agricultural sustainability currently suffers from a significant data gap. This early-stage study suggests how machine learning could bypass the need for exhaustive, primary farm-level audits by identifying the most critical variables required to understand the environmental footprints of crop production.

Note: This article is based on a preprint. The research has not yet been peer-reviewed and results should be interpreted as preliminary.

Measuring the ecological impact of farming is traditionally slow and data-intensive. This research tested five machine learning techniques to predict biodiversity, water, and climate impacts using aggregated farming information. The data covered 121 crops in 57 countries, aiming to find the most efficient way to fill knowledge gaps in global food supplies.

Optimising the environmental footprints of crop production

The study indicates that Random Forest models are highly effective for biodiversity predictions, while Generalised Boosting Methods excel at water footprinting. These initial findings highlight the key predictors of impact:

Biodiversity and climate impacts are largely predicted by yield, fertiliser use, electricity, and climatic region.
Water footprints are most sensitive to irrigation, fertiliser, and pesticide levels.
Artificial Neural Networks (ANN) offered success in predicting climate impacts with just four variables.

While these models currently show a prediction uncertainty of up to a factor of five in some categories, they represent a significant step toward scalable auditing. The trajectory for the next decade is clear: as these models move from the lab to the field, we are looking at a shift toward 'parsimonious' data collection. Rather than requiring every single data point from every farm, we can use these high-leverage predictors to fill knowledge gaps at scale.

Potential downstream impacts include:

More efficient sustainability auditing for global food commodities.
Targeted interventions in water-stressed regions based on predictive data.
Prioritising data collection on the ground by focusing on the variables that matter most for nature loss.

By using these models to organise low-carbon sourcing, the industry can move toward a more efficient global food system based on grounded, predictive data.

Optimising the environmental footprints of crop production

Cite this Article (Harvard Style)