Biology

HDTree: Generative Modeling of Cellular Hierarchies for Robust Lineage Inference

AI Insight

HDTree is a new generative modeling framework designed to infer cellular differentiation trajectories from single-cell data. It uses a hierarchical latent space with a unified codebook and a quantized diffusion process to model how cells transition between states, aligning its generative logic with the Waddington epigenetic landscape concept. In benchmark comparisons against existing methods on both general-purpose and single-cell datasets, HDTree demonstrated superior performance in lineage inference accuracy, reconstruction quality, and hierarchical consistency, while addressing common limitations such as posterior collapse and poor scalability found in previous VAE-based approaches.


Understanding cellular differentiation trajectories is fundamental to developmental biology, disease modeling, and regenerative medicine, and a more robust computational tool could accelerate discoveries in areas such as cancer biology, stem cell research, and the identification of therapeutic targets.


arXiv:2506.23287v3 Announce Type: replace-cross
Abstract: In single-cell research, tracing and analyzing high-throughput single-cell differentiation trajectories is crucial for understanding biological processes. Key to this is the robust modeling of hierarchical structures that govern cellular development. Traditional methods face limitations in computational cost, performance, and stability. VAE-based approaches have made strides but still require branch-specific network modules, limiting their scalability and stability, while often suffering from posterior collapse. To overcome these challenges, we introduce HDTree, a generative modeling framework designed for robust lineage inference. HDTree captures tree relationships within a hierarchical latent space using a unified hierarchical codebook and employs a quantized diffusion process to model continuous cell state transitions. By aligning the generative process with the Waddington landscape, this method not only improves stability and scalability but also enhances the biological plausibility of inferred lineages. HDTree’s effectiveness is demonstrated through comparisons on both general-purpose and single-cell datasets, where it outperforms existing methods in lineage inference accuracy, reconstruction quality, and hierarchical consistency. These contributions enable accurate and efficient modeling of cellular differentiation paths, offering reliable insights for biological discovery.footnote{Code is available at https://github.com/zangzelin/code_HDTree_icml.

Source: HDTree: Generative Modeling of Cellular Hierarchies for Robust Lineage Inference