Biology

Atlas-Level Single-Cell and Spatial Transcriptomics Data Integration via PRIME

AI Insight

PRIME (Projection-based Robust Integration via Manifold Embedding) is a new computational framework designed to integrate large-scale single-cell RNA sequencing and spatial transcriptomics datasets across heterogeneous sources. It combines random-projection-based consensus anchoring, graph-Laplacian correction, and optional spatial-neighborhood regularization to simultaneously correct batch effects while preserving genuine biological variation. In benchmarking tests, PRIME outperformed existing methods across multiple scenarios, including preserving developmental trajectories in human hematopoiesis, maintaining cortical laminar organization in brain spatial data, and recovering drug-target relationships in a perturbation dataset exceeding one million cells.


As biomedical consortia increasingly assemble multi-million-cell atlases to map human tissues in health and disease, robust integration tools like PRIME could improve the reliability of downstream analyses such as cell-type annotation, disease mechanism discovery, and drug response prediction. Its scalability and compatibility with spatial data position it as a potentially practical tool for large consortium efforts like the Human Cell Atlas.


⚠️ Preprint – Noch nicht peer-reviewed

Dieser Artikel wurde noch nicht von unabhängigen Experten begutachtet. Die Ergebnisse sind vorläufig und sollten mit Vorsicht interpretiert werden.

Single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics (ST) have enabled atlas-scale cellular cartography, with consortium efforts now assembling millions of cells across diverse tissues, donors, and technologies to build comprehensive references for cell identify and disease mechanism, yet the scientific value of these atlases hinges on robust computational integration across heterogeneous data sources. Unlike pairwise batch correction, atlas-level integration must jointly reconcile heterogeneous and often hierarchically nested batch effects across many datasets whose cell-type compositions are highly imbalanced, all while preserving subtle biological variation and remaining computationally tractable at the scale of millions of cells. Existing approaches often prioritize either batch mixing or preservation of local biological structure, and most cannot natively accommodate spatial coordinates. Here we introduce PRIME (Projection-based Robust Integration via Manifold Embedding), an ensemble integration framework that combines random-projection-based consensus anchoring, graph-Laplacian correction, and optional spatial-neighborhood regularization. Across multiple random projections of the expression manifold, PRIME uses consensus voting to keep only cell pairs that repeatedly matched, reducing false anchors caused by projection-specific distortions. For ST, PRIME couples this expression-based anchor graph with a coordinate-derived spatial neighborhood graph in a unified graph-Laplacian objective with closed-form solution, enabling simultaneous cross-batch alignment and local spatial coherence. Based on extensive benchmarking spanning diverse datasets, we show that PRIME consistently outperforms state-of-the-art methods in both batch correction and biological conservation across scRNA-seq and ST integration scenarios and downstream tasks including trajectory inference, spatial-domain preservation, and perturbation-response analysis. Particularly, when integrating a human hematopoiesis benchmark spanning eight donors and approximately 33,000 cells, PRIME preserves biologically coherent developmental trajectories in human hematopoiesis. It also maintains cortical laminar architecture across dorsolateral prefrontal cortex sections in a ST dataset and recovers known drug-target relationships in a perturbation atlas of more than 1 million cells while suppressing batch-associated confounders. Together, these results establish PRIME as a versatile and scalable framework for atlas-level integration of scRNA-seq and ST across diverse biological applications.

Source: Atlas-Level Single-Cell and Spatial Transcriptomics Data Integration via PRIME