Biology

Machine-Learning-Enhanced Non-Invasive Testing for MASLD Fibrosis: Shallow-Deep Neural Networks Versus FIB-4, Tabular Foundation Models, and Large Language Models

AI Insight

This study evaluated whether machine-learning-enhanced non-invasive tests (MLE-NITs) could improve detection of advanced liver fibrosis in metabolic dysfunction-associated steatotic liver disease (MASLD) compared to the standard FIB-4 index. Using three biopsy-confirmed cohorts from China, Malaysia, and India (n=784), the researchers compared FIB-4 against a shallow-deep neural network (s-DNN), a tabular foundation model (TabPFN), and a fine-tuned large language model (GPT-4o), all restricted to the same five clinical variables. The s-DNN achieved the most consistent external performance across both validation cohorts (ROC-AUCs of 0.77 and 0.67), outperforming FIB-4 in India and matching or exceeding the other models, while using only 354 trainable parameters compared to over 7 million for TabPFN.


A compact, well-calibrated neural network that marginally improves upon FIB-4 without requiring additional clinical data could be readily integrated into existing clinical workflows to reduce unnecessary liver biopsies. This has practical relevance in resource-limited settings where advanced diagnostics are unavailable but routine blood tests are accessible.


arXiv:2605.20523v1 Announce Type: cross
Abstract: Advanced fibrosis is a major determinant of liver-related morbidity in metabolic dysfunction-associated steatotic liver disease (MASLD). FIB-4 is widely used as a first-line non-invasive test, but its fixed formula may underuse diagnostic information contained in age, aspartate aminotransferase, alanine aminotransferase, and platelet count. We evaluated whether machine-learning-enhanced non-invasive testing (MLE-NIT) can improve advanced fibrosis detection while preserving this FIB-4 variable space.
We used three biopsy-confirmed MASLD cohorts from China, Malaysia, and India (n=784). The Chinese cohort was split into 486 training and 54 internal validation/tuning patients; final performance was reported only on the Malaysian and Indian external cohorts. Models used five variables: age, FIB-4, aspartate aminotransferase, platelet count, and alanine aminotransferase. We compared FIB-4 with a shallow-deep neural network (s-DNN), TabPFN, and gpt-4o-2024-08-06.
FIB-4 achieved external ROC-AUCs of 0.75 and 0.60 in Malaysia and India, respectively. TabPFN achieved 0.69 and 0.66, fine-tuned GPT-4o achieved 0.75 and 0.63, and the s-DNN achieved 0.77 and 0.67, respectively. The s-DNN contained only 354 trainable parameters, compared with 7,244,554 for TabPFN, yet provided a more balanced external operating profile. Calibration showed s-DNN Brier scores of 0.18 and 0.22, and permutation importance identified AST and FIB-4 as dominant variables. Compact non-linear MLE-NITs may enhance FIB-4-based fibrosis assessment without increasing clinical data requirements.

Source: Machine-Learning-Enhanced Non-Invasive Testing for MASLD Fibrosis: Shallow-Deep Neural Networks Versus FIB-4, Tabular Foundation Models, and Large Language Models