August 1, 2024

Are local or global models better? Why not both?

By:  
Alex Rich, Ben Birnbaum, Josh Haimson, and Will Hickman

A long-standing debate in cheminformatics is whether global property-prediction models perform better or worse than local QSAR models (see, e.g., Lascio et al., Mol. Pharmaceutics, 2023). Global models, which are trained on more data, hold the promise of learning universal patterns that underlie chemistry properties, and yet common wisdom is that local QSAR models, which are trained only on the series or program of interest, often perform better.

It’s of course hard to definitively answer the question in general, since it depends on the details of the global and local models. Our experience at Inductive is that the quality of global models can vary widely based on both the underlying training data and the way that they are trained. Nevertheless, we’ve found that usually global models that are fine-tuned on local data perform better than either global or local models alone.

We recently had the opportunity to study this question systematically with one of our partners, Nested Therapeutics (Rich et al., ACS Med. Chem. Lett. 2024). We built models for four ADME assays: MDCK Papp (A→B), MDCK Efflux Ratio, HLM, and RLM. For each assay, we built three models: a local model trained on Nested’s program data using a third-party commercially available tool, a global-only model trained on our proprietary data set, and a global+local model that was first trained on our proprietary data set and then fine-tuned on Nested’s program data.

As shown below, the global+local modeling approach generally performed best across the assays. It achieved the lowest MAE (Mean Absolute Error) across all four properties, and the highest Spearman rank correlation across all assays except MDCK Papp (A→B), where the global model correlation was slightly higher. The impact of the global+local approach was especially pronounced for MDCK Papp (A→B), where the local-only model was not predictive at all and patterns learned from the global datasets were needed for accurate predictions.

Figure 1 from Rich et al.: Performance of models trained on program data only (local only), non-program data only (global only), and on combined data (fine-tuned global), on temporally split test sets for HLM, RLM, MDCK AB and MDCK ER. Error bars represent 68% bootstrapped confidence intervals. MAE (Mean absolute Error) units are in Log10(mL/min/kg) for HLM & RLM, Log10(µcm/s) for MDCK AB, and Log10(ratio) for ER.

You can read more in the full paper, where we also share other best practices we’ve seen from our work with Nested and others optimizing ADME of drug molecules, such as regular time-based and series-level evaluation (something we’ve touched on before in this blog), frequent model retraining, and ensuring models are interactive, interpretable, and integrated.

Read the full paper here.