Biology

AI designs drug-like molecules that are easier to synthesize in labs

AI Insight

This paper introduces S3-GFN, a new machine learning approach for generating drug-like molecules that can actually be synthesized in laboratories. Unlike previous methods that use rigid rule-based constraints, S3-GFN uses a flexible soft constraint system combined with a Generative Flow Network trained on large molecular databases to guide the creation of chemically feasible molecules. The method achieves over 95% synthesizability while optimizing for desired molecular properties across multiple drug discovery tasks.


This advancement addresses a critical bottleneck in AI-driven drug discovery where computationally designed molecules often cannot be manufactured in real laboratories. By improving both the synthesizability and flexibility of molecular generation, this approach could accelerate the development of new therapeutic compounds and reduce wasted resources on impossible-to-make molecules.


arXiv:2602.04119v2 Announce Type: replace-cross
Abstract: The application of generative models for experimental drug discovery campaigns is severely limited by the difficulty of designing molecules de novo that can be synthesized in practice. Previous works have leveraged Generative Flow Networks (GFlowNets) to impose hard synthesizability constraints through the design of state and action spaces based on predefined reaction templates and building blocks. Despite the promising prospects of this approach, it currently lacks flexibility and scalability. As an alternative, we propose S3-GFN, which generates synthesizable SMILES molecules via simple soft regularization of a sequence-based GFlowNet. Our approach leverages rich molecular priors learned from large-scale SMILES corpora to steer molecular generation towards high-reward, synthesizable chemical spaces. The model induces constraints through off-policy replay training with a contrastive learning signal based on separate buffers of synthesizable and unsynthesizable samples. Our experiments show that S3-GFN learns to generate synthesizable molecules ($geq 95%$) with higher rewards in diverse tasks.

Source: Synthesizable Molecular Generation via Soft-constrained GFlowNets with Rich Chemical Priors