Biology

Genome-Factory: A Library for Tuning, Deploying, and Interpreting Genomic Foundation Models

AI Insight

Genome-Factory is an open-source Python library designed to streamline the full workflow for working with genomic foundation models, covering data collection, fine-tuning, inference, benchmarking, and biological interpretability in a single unified framework. The library supports both full and parameter-efficient fine-tuning across multiple genomic models, and introduces a sparse auto-encoder-based biological interpreter to help researchers understand what these models have learned from genomic sequences. Its utility is validated through compatibility testing with diverse models and fine-tuning methods, performance benchmarking on two existing genomic benchmarks, and interpretation of learned representations using DNABERT-2.


By lowering the technical barrier to developing and analyzing genomic AI models, Genome-Factory could accelerate research in functional genomics, gene regulation, and disease-related sequence analysis, making advanced deep learning tools more accessible to biologists without extensive machine learning expertise.


arXiv:2509.12266v2 Announce Type: replace
Abstract: We introduce Genome-Factory, the first integrated Python library for tuning, deploying, and interpreting genomic foundation models. Our core contribution is to simplify and unify the workflow for genomic model development: data collection, model tuning, inference, benchmarking, and interpretability. For data collection, Genome-Factory offers an automated pipeline to download genomic sequences and preprocess them. For model tuning, Genome-Factory supports both full and parameter-efficient fine-tuning across diverse genomic models. For inference, Genome-Factory enables both embedding extraction and DNA sequence generation. For benchmarking, we include two existing benchmarks and provide a flexible interface to incorporate additional benchmarks. For interpretability, Genome-Factory introduces an open-source biological interpreter based on a sparse auto-encoder. We validate the utility of Genome-Factory across three dimensions: (i) Compatibility with diverse models and fine-tuning methods; (ii) Benchmarking downstream performance using two open-source benchmarks; (iii) Biological interpretation of learned representations with DNABERT-2. These results highlight its practical value for real-world genomic analysis. GitHub: https://github.com/WeiminWu2000/Genome_Factory.

Source: Genome-Factory: A Library for Tuning, Deploying, and Interpreting Genomic Foundation Models