We explore how changing molecular notation from SMILES to IUPAC can dramatically improve the quality of analogs generated by LLMs, offering insight into how these models are trained and how we should evaluate what they are capable of.
A long-standing debate in cheminformatics is whether global property-prediction models perform better or worse than local QSAR models. We describe a result from our publication with Nested Therapeutics in which we show that the best of both worlds is to train a model on global data and then fine-tune it on local data.
AlphaFold 3 (AF3) is an exciting leap forward in our ability to predict the structure and properties of biomolecular systems. We explore how AF3 small-molecule docking compares to existing techniques and find that the story is more nuanced than headlines suggest. We conclude with thoughts on where AF3 may ultimately be most useful.
Research advances often don't translate to practical impact. In ML for small molecule drug discovery, this is at least partly because benchmark datasets don't capture important components of drug programs. We show how this can happen in ADME prediction and provide a path forward for building more realistic benchmarks from existing public data.