Inductive Bio Blog

Can ChatGPT Speak Chemistry?
Why the choice of molecular notation affects what an LLM understands
We explore how changing molecular notation from SMILES to IUPAC can dramatically improve the quality of analogs generated by LLMs, offering insight into how these models are trained and how we should evaluate what they are capable of.

Are local or global models better? Why not both?
A long-standing debate in cheminformatics is whether global property-prediction models perform better or worse than local QSAR models. We describe a result from our publication with Nested Therapeutics in which we show that the best of both worlds is to train a model on global data and then fine-tune it on local data.

Approaching AlphaFold 3 docking accuracy in 100 lines of code
AlphaFold 3 (AF3) is an exciting leap forward in our ability to predict the structure and properties of biomolecular systems. We explore how AF3 small-molecule docking compares to existing techniques and find that the story is more nuanced than headlines suggest. We conclude with thoughts on where AF3 may ultimately be most useful.

Get with the program
Building ADME benchmark datasets that drive impact
Research advances often don't translate to practical impact. In ML for small molecule drug discovery, this is at least partly because benchmark datasets don't capture important components of drug programs. We show how this can happen in ADME prediction and provide a path forward for building more realistic benchmarks from existing public data.
Subscribe to our blog
Share your info to learn more and follow along.