Research
Nov 15, 2025
1 min

SynthonGPT: Diversity-Oriented Retrieval in Ultra-Large Enumerated Chemical Spaces

A compact synthon-conditioned transformer for navigating makeable chemical space, grounded in vendor enumerations rather than hallucinated SMILES.

SynthonGPT is a compact synthon-conditioned transformer for navigating makeable chemical space. Instead of generating arbitrary SMILES without synthetic grounding, it is built around synthesis-aware building blocks and vendor enumerations, making it more aligned with practical discovery workflows.

Highlights

  • Count-matched benchmarks show up to 3.1x higher unique scaffold recovery than F-Trees and 1.76x higher than SpaceLight while maintaining lower mean similarity.
  • The model has roughly 90M parameters, trains in about 10 hours on a single RTX 4090, and supports sub-second inference on CPU and GPU.