Biology

Learning Normal Representations for Blood Biomarkers

AI Insight

Current blood test interpretation relies on fixed population reference intervals that ignore individual baseline variability, which can delay disease detection, while purely personalized approaches tend to overfit sparse data and inflate false-positive rates. This study introduces NORMA, a conditional transformer-based model trained on nearly 2 billion longitudinal laboratory measurements from over 1.6 million individuals across three continents, which generates personalized reference intervals by combining a patient's own history with population-level priors about normal variation. NORMA outperforms both purely population-based and purely personalized approaches in predicting clinically meaningful outcomes, including mortality, acute kidney injury, and chronic disease onset.


NORMA offers a practical path toward more accurate, individualized laboratory medicine that could reduce unnecessary follow-up tests while improving early detection of disease, with the model, code, and an interactive interface made publicly available to facilitate clinical and research adoption.


arXiv:2605.18701v1 Announce Type: cross
Abstract: Blood-based biomarkers underpin clinical diagnosis and management, yet their interpretation relies largely on fixed population reference intervals that ignore stable, intra-patient variability. As such, population-based interpretation can mask meaningful deviation from an individual’s baseline, risking delayed disease detection. To remedy this, there have been increasing efforts to personalize blood biomarker interpretation using individual testing histories. However, these methods may overfit to sparse data, inflating false-positive rates and unnecessary follow-up, and can also unwittingly include unrecognized or subclinical disease. Here, we leverage nearly 2 billion longitudinal laboratory measurements from over 1.6 million individuals across North America, the Middle East, and East Asia, to show that while laboratory values are highly individual, purely personalized intervals routinely overfit, classifying up to 68% of measurements as abnormal, without corresponding associations with adverse clinical outcomes. We then introduce NORMA, a conditional transformer-based framework that generates reference intervals by conditioning on both a patient’s history and population-level data about “normal” variation. NORMA-derived intervals achieve higher precision for predicting outcomes, including mortality, acute kidney injury, and chronic disease. These findings caution against over-personalization in laboratory medicine and demonstrate that anchoring individual trajectories to population-level priors outperforms either approach alone. To promote transparency, we publicly release the model, code, and an interactive user interface for accessible, individualized laboratory interpretation.

Source: Learning Normal Representations for Blood Biomarkers