AI identifies white blood cell types with minimal human training

bioRxiv (Preprint) 26 May 2026 2 min read

AI Insight

Researchers developed an AI system that can identify white blood cell types from blood smear images with minimal human labeling effort. The system uses a two-stage approach: first learning cell features without labels, then fine-tuning with selective examples chosen through active learning. Testing on nearly 6,000 images from cancer patients and 17,000 public images, the model achieved 96% accuracy for classifying nine cell types while requiring only 13.3% of available training labels to reach peak performance.

Why it matters

Current blood smear analysis is time-consuming, subjective, and particularly challenging for rare cell types in blood cancers. This annotation-efficient AI framework could make automated blood cell analysis more practical for clinical laboratories, reducing the burden on pathologists while maintaining high accuracy and providing interpretable results through visualization tools.

Confidence

6/10Preprint — not yet peer-reviewedBiology

⚠️ Preprint – Noch nicht peer-reviewed

Dieser Artikel wurde noch nicht von unabhängigen Experten begutachtet. Die Ergebnisse sind vorläufig und sollten mit Vorsicht interpretiert werden.

Background Peripheral blood smears (PBS) review is labor-intensive, subjective, and challenging for rare or morphologically heterogeneous cell types in hematologic malignancies. Artificial intelligence (AI) offers a scalable alternative, but broader clinical translation is constrained by annotation burden and limited interpretability. Methods We developed an interpretable, annotation-efficient AI framework that learns leukocyte morphology through a two-stage process: label-free representation learning to construct a morphological embedding space, followed by supervised fine-tuning for cell type and morphological attribute classification. The model was trained and evaluated on 5,952 PBS images from cancer patients at MD Anderson Cancer Center, including blast cells, and 17,092 images from public sources. Active learning strategies were assessed to improve label efficiency, and interpretability was examined using saliency and embedding visualization. An interactive web application, HemoSight, was developed to support clinical review. Findings The framework achieved a macro-F1 score of 0.96 for 9-way leukocyte classification on the internal test split and 0.83 on the held-out patient cohort. Active learning substantially reduced annotation requirements, reaching peak performance with only 13.3% of available labels and significantly improving learning efficiency across 8 of 9 cell types. The model generalized to classifying 11 leukocyte morphological attributes with a mean F1; score of 85.8% and revealed structured morphological landscapes. Saliency maps, embedding visualizations, and the HemoSight application enabled transparent morphological inspection of model predictions, supporting confidence in model behavior and feasibility for clinical integration. Interpretation Our framework enables scalable, annotation-efficient, and interpretable modeling of leukocyte morphology, supporting the integration of AI-assisted PBS review for hematopathology workflows.

Source: Interpretable morphology mapping of peripheral blood leukocytes using annotation-efficient artificial intelligence

📚 Science Explainer

Want to understand the basics behind this research?

Machine Learning and Computational Methods in Biology →