Professors from the Molecular Biology Department at the University of Chicago have employed Machine Learning to understand tons of data from researches done in the past about proteins and use this data to recreate artificial proteins in the lab.
Creating Artificial Proteins in Labs
A team lead by researchers in the Pritzker School of Molecular Engineering (PME) at the University of Chicago reports that it has developed an artificial intelligence-led process that uses big data to design new proteins that could have implications across the healthcare, agriculture, and energy sectors.
By developing machine-learning models that can review protein information studied from genome databases, the scientists say they found relatively simple design rules for building artificial proteins.
When the team used this new information to build proteins in the lab, the outcome was encouraging. These lab-made proteins have properties that are similar to those found in their natural counterparts.
“We have all wondered how a simple process like evolution can lead to such a high-performance material as a protein,” said Rama Ranganathan, Ph.D., Joseph Regenstein Professor in the Department of Biochemistry and Molecular Biology, Pritzker Molecular Engineering, and the College. “We found that genome data contains enormous amounts of information about the basic rules of protein structure and function, and now we’ve been able to understand and replicate nature’s rules to create proteins ourselves.”
His group developed mathematical models based on this data and then began using machine-learning methods to reveal new information about proteins’ basic design rules.
For this research, they studied the chorismate mutase family of metabolic enzymes, a type of protein that is important for life in many bacteria, fungi, and plants. Using machine-learning models, the researchers were able to reveal the simple design rules behind these proteins.
The model shows that just conservation at amino acid positions and correlations in the evolution of pairs of amino acids are sufficient to predict new artificial sequences that would have the properties of the protein family.
You can recreate it while you are learning to fully understand it
“We generally assume that to build something, you have to first deeply understand how it works,” Ranganathan said. “But if you have enough data examples, you can use deep learning methods to learn the rules of design, even as you understand how it works or why it’s built that way.”
He and his collaborators then created synthetic genes to encode for the proteins, cloned them into bacteria, and watched as the bacteria then made the synthetic proteins using their normal cellular machinery. They found that the artificial proteins had the same catalytic function as the natural chorismate mutase proteins.
Proteins are one of the fundamental building blocks of life. As these newly understood design rules are so relatively simple, the number of artificial proteins that researchers could potentially create with them is extremely large.
Taking this beyond the laboratory
In the meantime, they also hope to use this platform to develop proteins that can address pressing societal problems, like climate change. Ranganathan and Andrew Ferguson, PhD, an associate professor, have founded a company called Evozyne that will commercialize this technology with applications in energy, environment, catalysis, and agriculture.
What does their study involve?
Their study “An evolution-based model for designing chorismate mutase enzymes” appears in Science.
“The rational design of enzymes is an important goal for both fundamental and practical reasons. Here, we describe a process to learn the constraints for specifying proteins purely from evolutionary sequence data, design, and build libraries of synthetic genes, and test them for activity in vivo using a quantitative complementation assay,” explain the researchers in the abstract of their paper.
“For chorismate mutase, a key enzyme in the biosynthesis of aromatic amino acids, we demonstrate the design of natural-like catalytic function with substantial sequence diversity. Further optimization focuses the generative model toward function in a specific genomic context. The data show that sequence-based statistical models suffice to specify proteins and provide access to an enormous space of functional sequences.”
Though artificial intelligence revealed the design rules, Ranganathan and his collaborators still don’t fully understand why the models work. Next, they will work to understand just how the models came to this conclusion. “There is much more work to be done,” he said.