Biology

KaryoScope: rapid, alignment-free sequence annotation for the pangenome era

AI Insight

KaryoScope is a new computational tool that performs alignment-free genome sequence annotation across multiple feature classes in a single pass, completing analysis in minutes on standard hardware. Applied to the Human Pangenome Reference Consortium Release 2 assemblies, it identifies SST1 macrosatellite as the recurrent sequence at Robertsonian translocation fusion points, characterizes D4Z4 macrosatellite structural diversity relevant to facioscapulohumeral muscular dystrophy, and reveals previously undescribed centromere structural polymorphisms validated by fluorescence in situ hybridization. Unlike existing tools, which address only one feature class at a time and struggle in highly variable genomic regions such as centromeres and subtelomeres, KaryoScope provides base-resolution annotation across all desired feature classes simultaneously.


The ability to rapidly annotate the most variable and clinically relevant regions of the genome at pangenome scale could accelerate research into structural variants associated with genetic disease, including muscular dystrophies and chromosomal rearrangements. A pre-built human genome database is distributed with the tool, lowering the barrier to adoption in both research and potentially clinical genomics workflows.


⚠️ Preprint – Noch nicht peer-reviewed

Dieser Artikel wurde noch nicht von unabhängigen Experten begutachtet. Die Ergebnisse sind vorläufig und sollten mit Vorsicht interpretiert werden.

The pangenome era is producing long-read sequencing data and complete genome assemblies at a pace that current annotation methods cannot match. Existing tools were each built for a single feature class (repeats, centromeric satellites, or genes) and falter precisely where the genome is most variable and harbours clinically important variation: the centromeres, subtelomeres, and acrocentric short arms. Here we present KaryoScope, an alignment-free method to annotate an assembly at base resolution across any desired feature classes in a single pass, completing in minutes on a standard workstation. Applied to the Human Pangenome Reference Consortium Release 2 assemblies, KaryoScope identifies the SST1 macrosatellite as the recurrent sequence at Robertsonian translocation fusion points, delivers the first pangenome-wide census of D4Z4 macrosatellite structural diversity at the 4q and 10q subtelomeres relevant to facioscapulohumeral muscular dystrophy, and reveals previously uncharacterized centromere structural polymorphism, including chromosome-specific satellite loss and megabase-scale rearrangement validated by fluorescence in situ hybridization. A pre-built KaryoScope database for the human genome is distributed alongside the tool, and additional databases can be built for any reference genome or annotation source. Together, these capabilities bring the most variable regions of the genome within reach for comparative, clinical, and pangenome-scale analysis. KaryoScope is available at https://github.com/barthel-lab/KaryoScope.

Source: KaryoScope: rapid, alignment-free sequence annotation for the pangenome era