Machine learning to simplify development of new biorefining processes
Scientists use technique to automatically predict the amount of biofuel produced by microbes.
Scientists from the Department of Energy’s Lawrence Berkeley National Laboratory say they have developed a way to use machine learning to dramatically accelerate the design of microbes that produce biofuel.
Their computer algorithm starts with data about the proteins and metabolites in a biofuel-producing microbial pathway, but no information about how the pathway actually works. It then uses data from previous experiments to learn how the pathway will behave. The scientists used the technique to automatically predict the amount of biofuel produced by pathways that have been added to E. coli bacterial cells.
The researchers say that the new approach is much faster than the current way to predict the behaviour of pathways, and promises to speed up the development of biomolecules.
The research was published May 29 in the journal Nature Systems Biology and Applications.
In biology, a pathway is a series of chemical reactions in a cell that produce a specific compound. Researchers are exploring ways to re-engineer pathways, and import them from one microbe to another. This is described as harnessing ‘nature’s toolkit’. New synthetic biology capabilities like the gene-editing tool CRISPR-Cas9 are being used to improve the precision of this process.
“But there’s a significant bottleneck in the development process,” said Hector Garcia Martin. “It’s very difficult to predict how a pathway will behave when it’s re-engineered. Trouble-shooting takes up 99% of our time. Our approach could significantly shorten this step and become a new way to guide bioengineering efforts.”
The current way to predict a pathway’s dynamics requires a maze of differential equations that describe how the components in the system change over time. Experts develop these ‘kinetic models’ over several months, and the resulting predictions don’t always match experimental results.
Machine learning, however, uses data to train a computer algorithm to make predictions. The algorithm learns a system’s behaviour by analysing data from related systems. This allows scientists to quickly predict the function of a pathway even if its mechanisms are poorly understood—as long as there are enough data to work with.
The scientists tested their technique on pathways added to E. coli cells. One pathway is designed to produce a bio-based jet fuel called limonene; the other produces a gasoline replacement called isopentenol. Previous experiments at JBEI yielded data related to how different versions of the pathways function in various E. coli strains. Some of the strains have a pathway that produces small amounts of either limonene or isopentenol, while other strains have a version that produces large amounts of the biofuels.
The researchers fed this data into their algorithm and then machine learning took over. The algorithm taught itself how the concentrations of metabolites in these pathways change over time, and how much biofuel the pathways produce. It learned these dynamics by analysing data from the two experimentally known pathways that produce small and large amounts of biofuels.
The algorithm used this knowledge to predict the behaviour of a third set of ‘mystery’ pathways the algorithm had never seen before. It accurately predicted the biofuel-production profiles for the mystery pathways, including that the pathways produce a medium amount of fuel. In addition, the machine learning-derived prediction outperformed kinetic models.
“And the more data we added, the more accurate the predictions became,” said Garcia Martin. “This approach could expedite the time it takes to design new biomolecules. A project that today takes ten years and a team of experts could someday be handled by a summer student.”