AI Insight
Researchers developed an AI system called DSTA-LSTM that analyzes video recordings to detect depression in prison inmates by examining facial expressions and micro-movements. The system achieved 93.4% accuracy in identifying depression by processing 1,216 video segments from 76 incarcerated participants, using dual neural networks to track both visual features and specific facial muscle movements (Action Units). The AI identified particular facial expressions, especially those involving the eyebrows and lip corners, as key indicators of depression in this high-risk population.
Why it matters
This technology could provide objective, automated mental health screening in correctional facilities where depression rates are high but traditional psychiatric assessments are subjective, time-intensive, and often unavailable. The non-invasive video-based approach could enable earlier detection and intervention for at-risk inmates, potentially improving both individual outcomes and facility safety.
Background/objectiveIncarcerated individuals face high depression risk due to the stressful closed correctional environment. Facial expressions best reflect emotional and psychological states, but traditional screening methods are subjective and fail to capture their temporal dynamics, so this study proposes an automated screening method based on visual temporal modeling.MethodsA Dual-Stream Temporal Attention LSTM (DSTA-LSTM) model was developed for depression screening. Seventy-six incarcerated participants provided 1,216 valid video segments, which were preprocessed via semantic chunking and optical flow enhancement to capture subtle micro-expression dynamics. Visual features were extracted via MobileNetV2 with an Efficient Local Attention (ELA) module, and physiological features via 20 facial Action Units (AUs) and valence-arousal vectors. Dual LSTM branches modeled temporal features of both streams, which were fused by an attention mechanism for accurate depression screening, and model performance was evaluated via 5-fold cross-validation, ablation experiments, and SHAP analysis.ResultsThe DSTA-LSTM significantly outperformed five baseline models, including ResNet50, 3D-CNN, and TimeSformer, achieving an AUC of 0.934, F1-score of 0.892, sensitivity of 0.913, and specificity of 0.886. Ablation experiments showed that removing the ELA module, AU stream, or temporal attention reduced the AUC by 5.1, 2.6, and 1.7%, respectively. SHAP analysis revealed AU4 and AU15 as the most influential features, which is consistent with clinical depression manifestations.ConclusionThe DSTA-LSTM has strong visual detection capability for facial dynamics, offering a reliable automated solution for incarcerated depression screening and important significance for public health and security; future research will integrate multimodal data to further improve model performance.