AI Boosts Maize Yield Prediction Through Integrated GxE Modeling

Accurately predicting maize yield is a formidable challenge, primarily due to the intricate interplay between a plant's genetic makeup (genotype) and its surrounding environmental conditions (environment), known as G×E interactions. This research aimed to overcome these limitations by introducing a sophisticated deep-learning approach designed specifically to integrate and model G×E effects with improved precision.

The developed framework is a powerful fusion of several advanced AI technologies: Long Short-Term Memory (LSTM) networks, Graph Neural Networks (GNNs), and transformer-style attention mechanisms. Weather data for a growing season is summarized by an LSTM into a 21-dimensional embedding, serving as an environment node feature. Similarly, 437,214 genetic markers (SNPs) are summarized into 548 principal components, which form the genotype nodes. Multi-head attention dynamically weights the edges during message passing. The study compared three architectures, culminating in 'Architecture C', which features a single learnable supernode readout that attends over all nodes after message passing.

The efficacy of this framework was rigorously tested using a forward-time split, training on data from 2014–2021 and testing on 2022 data with completely unseen genotypes and environments. The results were compelling: performance improved monotonically across the architectures, with Architecture C demonstrating significantly superior accuracy. Compared to the baseline Architecture A, C achieved a remarkable reduction in Root Mean Squared Error (RMSE) by approximately 20.3% (0.5629) and a substantial increase in Pearson Correlation Coefficient (PCC) by about 68.8% (0.283). These improvements underscore the framework's ability to facilitate global, content-adaptive aggregation, thereby enhancing the propagation of local G×E information.

Beyond its impressive accuracy, a key strength of this deep-learning approach lies in its robustness. As lead author Morshedian notes in the paper, "Performance of our approach remains consistent regardless of the number of genotypes per environment and has strong performance under variable or unbalanced genotype sampling expression across environments." This demonstrates the model's consistent strength and high predictive power, even under conditions of variable or unbalanced genotype sampling across different environments.

Cite this Article (Harvard Style)