Deep Learning Model Detects Depression via Speech Patterns with 99% Accuracy
Source PublicationJournal of Voice
Primary AuthorsAshok Kumar, Domala, Sajjan et al.

Depression is a treatable condition, yet high costs and long waiting times often deter individuals from seeking professional help. To bridge this gap, researchers have proposed a new voice-based classification method known as AVA-TIPNN-DD, which uses deep learning to identify mental health states from speech patterns.
The process begins by gathering voice recordings and cleaning them with a ‘Koopman Kalman particle filter’ to remove background noise. The system then extracts intricate spectrum features—such as power density and spectral flatness—using a complex transform technique. These acoustic fingerprints are fed into a Temporal Inductive Path Neural Network (TIPNN), a machine learning model designed to classify the data as either depressed or non-depressed.
Crucially, the team utilised a ‘binary battle royale optimizer’ to fine-tune the network’s internal parameters, ensuring the model adapts for maximum precision. In testing, this approach demonstrated exceptional performance, achieving 99.26 per cent accuracy and 99.6 per cent sensitivity, significantly outperforming several existing deep learning techniques for depression diagnosis.