Biology

rsx: A high-performance streaming toolkit for RAD-seq sex determination

AI Insight

Researchers developed rsx, a high-performance software toolkit that reimplements RADSex functionality for identifying sex-linked genetic markers from RAD-seq data in non-model organisms. The new Rust-based implementation maintains compatibility with existing workflows while using advanced computational techniques like memory-mapped tables and parallel processing to dramatically reduce memory requirements. Testing on four real datasets containing 41.9 billion bases and 29 million markers showed rsx achieved an 8.38-fold average speedup compared to the original RADSex software while reproducing all published results and adding Bayesian statistical analysis capabilities.


This tool enables researchers studying sex determination in understudied species to analyze much larger datasets on standard computing hardware, making genetic sex marker discovery more accessible and efficient. The addition of Bayesian statistical methods provides more nuanced evidence grading for potential sex-linked markers, potentially reducing false positives and revealing previously missed candidates.


arXiv:2606.06434v1 Announce Type: new
Abstract: Restriction site-associated DNA sequencing (RAD-seq) is widely used to discover sex-linked markers in non-model organisms, but large studies produce marker tables with millions of RAD tags. RADSex provides the reference workflow for building marker-by-individual depth tables and testing sex-biased marker distributions, but its depth, merge, and related table-building commands grow memory-hungry, and its standard output reports frequentist calls with no posterior evidence and no direct Python or C integration. We present rsx, a Rust implementation of the complete RADSex command set that preserves marker-table semantics and command-line compatibility. rsx combines 2-bit DNA keys, parallel ingestion, memory-mapped marker tables, external sorting, bitset group counts, and streamed Gram-matrix PCA so that memory stays bounded by the number of individuals or by explicit buffers. It adds conjugate Beta-Binomial Bayes factors and posterior probabilities under XY and ZW hypotheses, returning strict, posterior-supported, and Bayes-factor-only evidence grades. A portable, libm-independent minimax approximation of the error function keeps the chi-squared tail reproducible across platforms without changing the underlying Yates test. On four real RAD-seq datasets comprising 41.9 billion bases and 29 million markers, rsx reproduced published RADSex v1.2.0 calls, achieved an 8.38-fold geometric-mean speedup across 56 paired timings (2.77-fold for FASTQ processing), and recovered every Bonferroni-significant positive-control marker. In Danio albolineatus, treated as null in the source publication, the posterior layer surfaced 30 W-linked marker hypotheses; in Notothenia rossii it withheld 400 Bayes-factor-only rows compatible with a low-prevalence null. Python bindings, a C API, and a reproducibility archive provide the workflows used for all reported numbers. rsx is released under GPL-3.0-or-later.

Source: rsx: A high-performance streaming toolkit for RAD-seq sex determination