AI Eyes: New Optimisation Technique Boosts Image Captioning for the Visually Impaired
Source PublicationScientific Reports
Primary AuthorsAlkhaldi, Asiri, Alzahrani et al.

For those living with visual disabilities, navigating a visual world often requires relying on alternative senses like touch and hearing. A new study aims to bridge this gap with an advanced image captioning system designed to describe scenes aloud. The model, termed FDTLGO-AICSVD, fuses three distinct deep transfer learning models—DenseNet121, VGG19, and MobileNetV2—to extract detailed features from images.
To ensure clarity, the system first processes images through noise removal and contrast enhancement. Crucially, the researchers employed the Gannet Optimisation Algorithm (GOA) to fine-tune the system's hyperparameters—essentially the settings that control how the AI learns. This optimisation ensures the generated captions are both precise and context-aware.
When tested against standard datasets like Flickr8k and Flickr30k, the method demonstrated superior language generation capabilities. It achieved a BLEU-4 score—a standard metric for evaluating machine translation—of 58.91% on the Flickr30k dataset, outperforming previous benchmarks. This technology promises to enhance the quality of life for visually impaired people by helping them recognise objects and events in their immediate surroundings.