Biology

Selecting genomes that matter: haplotype-based prioritization for iterative pangenome expansion

AI Insight

The authors developed SelHap, a bioinformatics pipeline that uses whole-genome sequencing data to prioritize which plant accessions should be added to an existing pangenome, based on their contribution of novel haplotypes relative to the current pangenome content. Applied to barley, SelHap was used to select 19 new accessions from a large sequencing panel, which were then assembled into chromosome-scale genomes alongside 17 elite breeding lines chosen by conventional criteria. Benchmarking demonstrated that SelHap-selected accessions consistently contributed more non-redundant pangenome sequence than conventionally selected ones, validating the haplotype-novelty approach.


As reference pangenomes for crops like barley approach completeness, efficient and targeted expansion becomes critical for capturing agriculturally relevant genetic diversity; SelHap provides a scalable, data-driven framework that could accelerate pangenome-based breeding and genomic research across many species.


⚠️ Preprint – Noch nicht peer-reviewed

Dieser Artikel wurde noch nicht von unabhΓ€ngigen Experten begutachtet. Die Ergebnisse sind vorlΓ€ufig und sollten mit Vorsicht interpretiert werden.

Background As pangenomes approach saturation, identifying additional genomes that contribute novel sequence information becomes increasingly difficult. Current sample-selection strategies often rely on global diversity metrics or variant counts and do not explicitly account for the composition of an existing pangenome, a limitation that becomes increasingly relevant as pangenomes mature. Here, we present SelHap, a haplotype-based pipeline that uses whole-genome sequencing (WGS) data to prioritize accessions based on their contribution of novel haplotypes relative to a defined background, enabling targeted and iterative pangenome expansion. Results We applied SelHap to the barley pangenome, using 76 assembled genomes as a background to select new accessions from a large WGS panel. Using this approach, we generated chromosome-scale genome assemblies from 19 accessions selected with SelHap and from 17 elite lines selected based on their relevance in historical barley breeding. Across multiple benchmarking scenarios, SelHap-based selection consistently resulted in a greater increase in non-redundant (single-copy) pangenome sequence, demonstrating that prioritizing haplotype novelty relative to an existing background maximizes unrepresented sequence content. Conclusions By transforming complex haplotype-clustering outputs into interpretable summaries and ranked candidate lists, SelHap provides a practical framework for targeted pangenome expansion. Beyond sample selection, SelHap can facilitate ancestry and germplasm comparisons across diverse panels. As WGS data become more accessible, SelHap offers a scalable and interpretable solution for extending mature pangenomes by explicitly targeting previously unrepresented sequence space.

Source: Selecting genomes that matter: haplotype-based prioritization for iterative pangenome expansion