Medicine

Evaluating OCT Device-Reported Image Quality Score: Towards a Task-Specific Quality Gate for Deep Learning-based Outer-Retina and Choroid Boundary Segmentation

AI Insight

This study evaluated whether the Heidelberg Spectralis Q-score, a manufacturer-defined signal quality metric, can reliably predict deep learning segmentation accuracy for three retinal boundaries across 5,047 B-scans from 103 eyes. Results showed that the Q-score explains less than 1.4% of variance in segmentation error for all three boundaries, indicating it is a poor proxy for U-Net model performance. A consistent increase in segmentation error with anatomical depth was observed, and at the choroidal outer boundary, higher Q-scores paradoxically correlated with greater segmentation error, mediated by greater choroidal thickness rather than image quality itself.


Clinical imaging pipelines that use manufacturer quality scores as automated gatekeeping criteria for AI-based retinal analysis may be systematically misdirected, accepting scans that challenge deep learning models while potentially rejecting scans those models handle well. This work motivates the development of task-specific, model-calibrated quality thresholds to improve the reliability of automated retinal diagnostics.


⚠️ Preprint – Noch nicht peer-reviewed

Dieser Artikel wurde noch nicht von unabhängigen Experten begutachtet. Die Ergebnisse sind vorläufig und sollten mit Vorsicht interpretiert werden.

Manufacturer-defined signal-strength indices are frequently employed as quality benchmarks for automated optical coherence tomography analysis, yet their empirical relationship with deep learning segmentation accuracy remains unclear. Because these metrics were originally developed for conventional image-processing pipelines, their ability to predict modern model-based segmentation accuracy has not been empirically validated. To address this gap, we evaluated the Heidelberg Spectralis Q-score against U-Net segmentation performance across 5,047 B-scans from 103 eyes for three anatomical boundaries of the posterior segment of the eye: the Ellipsoid Zone (EZ), Bruch’s Membrane (BM), and Choroid Outer Boundary (COB). Alongside standard boundary agreement metrics (MAE, MSE, Dice Similarity Coefficient), we adapted the Earth Mover’s Distance (EMD) from optimal transport theory as a boundary evaluation metric. Unlike column-wise averages, EMD quantifies boundary agreement as a 2-D geometric displacement, directly measuring residual spatial displacement between the model segmented boundary and the ground-truth boundary. Our results demonstrate that the Q-score – originally designed to gate image-processing-based automated analysis – is a poor predictor of deep learning boundary segmentation accuracy, with explained variance (R2) failing to exceed 1.4% across all three boundaries. We further observed a monotonically increasing error hierarchy with anatomical depth (EZ < BM < COB), consistent across metrics, which is unexplained by the signal strength. At the COB, correlations were paradoxically positive, explained by a B-scan-level mediation chain: higher Q-scores correspond to greater choroidal thickness (r=0.113, {rho}=0.158), which in turn predicts higher COB segmentation error (r=0.165, {rho}=0.191) – a localization difficulty that global signal strength cannot capture. Collectively, these findings challenge the implicit assumption that signal-strength-based quality thresholds are a reliable proxy for deep learning model performance, and motivate a shift toward task-specific acquisition quality criteria calibrated to model performance rather than signal interpretability.

Source: Evaluating OCT Device-Reported Image Quality Score: Towards a Task-Specific Quality Gate for Deep Learning-based Outer-Retina and Choroid Boundary Segmentation