Researchers from Princeton University and Merck Research Laboratories have used machine learning technology to predict the outcomes of chemical reactions.

The team, led by Abigail Doyle, A. Barton Hepburn professor of chemistry at Princeton and Dr Spencer Dreher of Merck, has developed software that can accurately predict reaction yields of experiments involving up to four components. Their findings are published in ‘Predicting reaction performance in C-N cross-coupling using machine learning’, in the journal Science.

Historically, predicting such outcomes has been a challenge due to the difficulties of collecting enough data to establish a ‘training set’ of information that can be used for future predictions. In addition, calculating the effects of changes in quantity and nature of components has been a time-consuming process, only allowing small changes to be made in prediction calculations.

“The software that we developed can work for any reaction, any substrate,” said Doyle. “The idea was to let someone apply this tool and hopefully build on it with other reactions.”

The project was originally the PhD work of Derek Ahneman, who studied under Doyle in 2017 and now works for IBM; as Doyle puts it, ‘as chemists, we’ve traditionally veered away from multi-dimensional analysis.’

This particular form of multi-dimensional analysis involves random forest prediction models and the Spartan molecular modelling program. The team used Spartan to calculate descriptors for each chemical used in the model to use as inputs. Then, the used a machine learning model called ‘random forest’ – where randomly-selected small samples are used to build a decision tree that predicts the yield for a particular reaction, and then the results of several decision trees are averaged to generate an overall yield prediction – to predict the outcome of specific experiments.

How well do you really know your competitors?

Access the most comprehensive Company Profiles on the market, powered by GlobalData. Save hours of research. Gain competitive edge.

Company Profile – free sample

Thank you!

Your download email will arrive shortly

Not ready to buy yet? Download a free sample

We are confident about the unique quality of our Company Profiles. However, we want you to make the most beneficial decision for your business, so we offer a free sample that you can download by submitting the below form

By GlobalData
Visit our Privacy Policy for more information about our services, how we may use, process and share your personal data, including information of your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.

The researchers also discovered that the random forest model could work using only hundreds of reactions, as opposed to the usual thousands. They also found that the random forest model can predict yields for chemical compounds not included in the training set, suggesting the program is capable of learning from information that is not provided.

“These results are exciting, because they suggest that this method can be used to predict the yield for reactions where the starting material has never been made, which would help minimise the consumption of chemicals that are time-consuming to make,” Ahneman said.

“Overall, this methodology holds promise for, one, predicting the yield for reactions using as-yet-unmade starting materials and, two, predicting the optimal conditions for a reaction with a known starting material and product.”

Since Ahneman left the laboratory, his work has been continued by graduate student Jesús Estrada, but the team’s goal is for those beyond computer experts such as Ahneman and Estrada to use the software.

“The idea is to help people navigate the multi-dimensional space where you can’t intuit the outcomes,” said Doyle.