Hayadan > Computer simulation of a living cell / Marcus V. Kobert

Computer simulation of a living cell / Marcus V. Kobert

Biologists develop the first computerized model of a single-celled organism in its entirety, a model that will serve as a new and powerful tool for understanding the mechanism of life

Live cells through the microscope. Illustration: shutterstock

The crucial insight dawned on me as I lazily rode my bike home from work. It was in 2008 on Valentine's Day. As I rode, I pondered a problem that had occupied me and others in the field for more than ten years: Is there any way to develop a computer simulation of life, including all the wonderful, mysterious and infuriatingly complex biochemistry that underlies life?

A computerized model of a living cell, even if an outline is not entirely accurate, would be an incredibly effective tool. Biologists will be able to use it to test ideas for experiments before allocating time and money to carry out the experiments in the laboratory. Drug developers, for example, will be able to speed up their research to discover new antibiotics by focusing on molecules whose inhibition will disturb the virtual bacteria to the greatest extent. Bioengineers like me will be able to transplant and rewire genes of virtual organisms to design new strains with special properties, for example the ability to emit fluorescent light in response to infection with a certain virus, or the ability to produce hydrogen gas from oil, without the risks involved in changing real bacteria. Ultimately, if we can figure out how to make models sophisticated enough to simulate human cells, we will create tools that could revolutionize medical research by providing researchers with a way to conduct studies that are not possible today because many types of human cells are not grown in culture.

But all this seems like a distant dream in the absence of a practical way to unravel the intricate web of chemical reactions and physical connections that enable a living cell to function. Many previous attempts, in my lab at Stanford as well as others, have encountered insurmountable difficulties, and some have failed in the first place.

But as I pedaled slowly across campus that winter evening, I reflected on a recent study I'd been doing in which I'd been documenting images and videos of living cells. Then it hit me: the way to produce a functional and realistic simulator is to choose one of the simplest bacteria, a bacterium called Mycoplasma genitalium, and build a model of a single bacterium. Limiting the imaging to only one cell will simplify the problem so that we can, in principle, include every piece of biology that has been recorded in this cell: the priming of every lawmaker in its coiled DNA ladder, the transcription of every gene in DNA for RNA transcription, the production of every enzyme or protein Another produced according to the instructions of the RNA, and the connections between each of these players and many others, all of which eventually cause the cell to grow and divide into two "daughter" cells. The visualization will reproduce, almost from the basic principles, the drama of unicellular life in its entirety.

In the previous experiments, they always tried to simulate a whole colony of cells because almost all the information we have about cell behavior comes from populations and not from single cells. However, advances in the fields of biology and computers have facilitated studies in single cells. Now, I realized, the tools are available to try a different approach.

Ideas raced through my head. As soon as I got home, I started making plans for the simulator. The next morning, I began writing software code for some of the many processes that occur in a living microorganism. Within a week, I completed several prototype modules, each of which is software representing a given cellular process. The modules provided output that looked quite realistic.

I showed the work to a handful of biologists. Most of them thought I was crazy. But I felt that I was "on to" something, and two extraordinary and daring doctoral students, Jonathan R. Carr and Giudita K. Sangebi, thought that there was a possibility of realizing this in my approach and agreed to work with me on the project. Completing the computer model involves creating dozens of such modules, scanning nearly 1,000 scientific articles in search of biochemical data, and using these values to constrain and fine-tune thousands of parameters, such as how tightly enzymes bind to their target molecules and how often proteins that read the DNA dislodge each other from the helix. the double I was afraid that even with the help of collaborations and doctoral students, the project would take years, but I also believed that in the end it would be successful. There was no way to know for sure, but to try.

A big challenge

While preparing to conquer the mountain, we were inspired by the first explorers who dreamed of creating a model for life. In 1984, Harold Morowitz, who was then at Yale University, charted the way. He noticed that the simplest bacteria that biologists were able to grow in culture, the mycoplasmas, were a logical start. Apart from being extremely small and relatively simple, two strains of mycoplasma cause disease in humans: the sexually transmitted parasitic bacterium, M. genitalium, which thrives in the urinary tract and vagina, and Pneumonia, which can cause mild pneumonia. A model of each of the strains will be medically useful, and will also serve as a source of insights in basic biology.

Morowitz said the first step should be to sequence the genome of the chosen bacterium. J. Craig Venter and colleagues at the Institute for Genome Research (TIGR) completed this task in strain M. Genitalium in 1995. The bacterium has only 525 genes. (For comparison, human cells have more than 20,000.)

I was a PhD student in San Diego when, four years later, the research group at TIGR concluded that only about 400 of these genes are essential to support life (as long as the bacteria are grown in a rich culture medium). Venter and his colleagues moved on, founded Celera and competed with the US government to sequence the human genome. They synthesized the essential genes of one mycoplasma strain and showed that they function inside a cell.

To me and to other young biologists in the late 90s, this group was the Led Zeppelin of biology: larger-than-life convention-breakers playing music we'd never heard before. Clyde Chinson, one of the biologists in Venter's group, said that the real test of how we understand simple cells will only be when someone develops a computer model of such a cell. In the lab you can build an active cell by joining separate pieces without understanding every detail that makes them work together. Not so when it comes to software.

Morowitz also called for building a simulator of a living cell based on the mycoplasma genome. He claimed that "any experiment that can be performed in a laboratory can also be performed on a computer. The degree of agreement between the [experimental and imaging] results is a measure of the completeness of the paradigm of molecular biology", that is, of the theory that explains how DNA and other molecules in the cell react with each other to create the life we know. In other words, as we put the pieces together, we better understand which parts and which interrelationships are missing from our theory.

Although high-throughput DNA sequencing devices and robotic laboratory equipment have greatly accelerated the search for the missing pieces, the resulting deluge of DNA sequences and patterns of gene activity did not come with instructions on how to put all the pieces together. Genetics pioneer Sidney Brenner called this type of research "data-poor, high-throughput, zero-understanding" biology, because too often the experiments are not based on hypotheses and provide disappointingly little insight into the larger systems that make life work properly—or break down. .

This situation explains why, despite the headlines frequently announcing the discovery of new genes related to cancer, obesity or diabetes, the cure for these diseases continues to be infuriatingly elusive. It seems that a cure will only be found when we can untangle the tangle of tens or even hundreds of factors that react with each other, even if sometimes in counterintuitive ways, and cause these diseases.

The pioneers of computer imaging of the cell realized that simulations of whole cells that would contain all the cellular components and the network of reactions between them would serve as powerful tools for mastering order in the disorganized and fragmented biological knowledge. By its very nature, imaging a living cell will distill a broad set of hypotheses about what is happening in the cell into rigorous mathematical algorithms.

The illustrations that are often found in scientific articles, showing that factor X controls gene Y ... in some way ... are very far from being accurate enough to write a computer program. Programmers express the processes as equations, such as the simple example Y=aX+b, even if they have to make an educated guess as to what the values of a and b are. The need for precision ultimately clarifies what experiments must be performed in the laboratory to fill in the gaps in existing knowledge about the rate of chemical reactions and other metrics.

At the same time, it was clear that once a model was verified and proved to be accurate, it would replace certain experiments, and the expensive laboratory experiments would be used to answer questions that could not be answered only with the help of simulations. Also, simulations of experiments that produce surprising results will help researchers determine the priorities in their research and accelerate the pace of scientific discoveries. In fact, models serve as such alluring tools to distinguish cause from effect that in 2001, Masaru Tomita of Keio Bifu University declared whole-cell imaging to be "the great challenge of the 21st century."

While I was a doctoral student, I was impressed by the initial results of the leading researchers in the development of cellular models at the time [see box on the opposite page], and I developed a "craze" for this great challenge. Even as I set up my own lab and focused on developing methods for (visual) imaging of single cells, the challenge remained in my thoughts. Then, on that bike ride home in February 2008, I saw a way to meet the challenge.

Two crucial insights

It was clear that before we could develop a simulation of the life cycle of a bacterium that would be accurate enough to mimic its complex behavior and make new discoveries in biology, we would have to solve three problems. First, we had to code into mathematical equations and algorithms all the important activities, starting with the flow of energy, nutrients and chemical reaction products through the cell (that is, its metabolism), through the synthesis and breakdown of DNA, RNA and proteins, and ending with the activity of many enzymes . Second, we had to merge all these activities into one general framework. The last problem was by far the most difficult: to establish upper and lower bounds for the approximately 1,700 variables in the model, so that they would have biologically correct values, or at least of the same order of magnitude.

I realized that no matter how carefully we scan the literature regarding M. genitalium and its relatives in the search for these variables (Carr, Sangebi and I ended up spending two years collecting data from about 900 articles), we will have to be content in some cases with educated guesses or using results from experiments with completely different bacteria, such as Escherichia coli, to obtain certain numbers, such as For example, how long, on average, do RNA transcripts stay in the cell before enzymes break them down to recycle their components. Without a way to limit and test these guesses, we had no chance of success.

In that moment of enlightenment in 2008, I realized that developing a model of a single cell and not of a group of cells, unlike almost all the studies conducted up to that point, would allow us to set the boundaries we needed. Take for example growth and reproduction. A large population of cells grows in stages. The birth or death of a single cell does not change much. But for a single cell, division is a very dramatic event. Before a cell divides into two, the organism has to double its mass, and not just the total mass. The amounts of DNA, the cell membrane and every type of protein needed for survival must be doubled. If the model is limited to a single cell, the computer can actually count each molecule and track it throughout the entire cell cycle. It can also check if all the numbers balance when one cell becomes two cells.

Moreover, a single cell actually reproduces at a constant rate. M. Genitalium, for example, divides every 9 to 10 hours under normal laboratory conditions. Cells rarely divide in less than six hours or more than 15. The requirement that a cell must replicate all of its contents on this rigid schedule allows us to select reasonable domains for many variables that cannot be determined under other conditions, such as those that determine when DNA replication will begin.

I gathered a team of physicists, biologists, computer people and even a software engineer who used to work at Google, and we discussed the mathematical approaches to be taken. Michael Schuler, a biomedical engineer at Cornell University who pioneered computer simulations of cells, built impressive models using ordinary differential equations. Bernhard Pelson, with whom I studied in San Diego, developed a powerful method called Flux and Equilibrium Analysis (FBA) for building computer models of metabolism. But others have shown that randomness is an important component of gene transcription, and cell division must involve changes in the geometry of the cell membrane. The methods I described do not address these aspects. Already as a doctoral student I understood that there is no single method capable of serving as a model for all cell activities. Indeed, my PhD thesis demonstrated a way to combine two separate mathematical approaches into one simulator.

That is why we decided to create a model of an entire cell as a collection of 28 different modules, each of which uses the most suitable algorithm for the biological process and the level of knowledge we have about it [see box on page 48]. However, this strategy amounted to a patchwork collection of mathematical processes. We had to sew them together in some way to get one complete structure.

I remembered the course I took in my first degree, about planning a chemical plant. For the final project in the course we used a powerful simulation package, called HYSYS, to design a large refinery. HYSYS allowed us to design each major chemical reaction as if it were occurring in a separate vessel. Pipes connected the output of one vessel to the input of other vessels. This structure connected many types of chemical operations into an orderly and clear system.

It occurred to me that this approach, with the necessary modifications, might be suitable for simulating our cell if we agree to make one important assumption that will simplify the process: that even biological processes occurring at the same time in the living cell are not dependent on each other in time periods smaller than one second. If this assumption is valid, we can divide the cell's lifetime into seconds and run each of the 28 modules, in order, for one second before updating the cell's variable pool. The model will express all the interrelationships of biochemistry, such as the dependence of gene transcription and DNA synthesis on energy and nucleotides produced in the metabolic process, but only in orders of magnitude longer than one second.

We had no theoretical proof that it would work. It was a matter of faith.

When we built our virtual cell, we inserted into it sensor software whose function is to measure what is happening inside. Each run of the simulator, which included the entire cell cycle of a single cell, amounted to 500 megabytes of data. The numerical output flowed into a sort of instrument panel, meaning a collection of dozens of tables and charts that would fill an entire binder.

At first the results were disappointing. For months we have been fixing errors in the software, improving the mathematics and adding many and better limits on the variables, which originate from laboratory experiments. But the cell refused to divide or behaved irrationally. For a certain period of time it produced huge amounts of the amino acid alanine and very little of everything else.

Then, one day, our cybernetic bacterium reached the end of its cell cycle and successfully divided. And what was even more exciting: the doubling time was about nine hours, just like m. living genitalium. Many other figures were still wrong, but we felt that success was at hand.

A few months later I attended a two-day conference in Bethesda, Maryland, and between one session and another I was called to the front desk.

"Dr. Covert? You received a package."

In my room, I opened the box and took out a binder. I spent the next few hours flipping through the hundreds of pages of complicated charts and tables, and my heart began to pound. Most of the data looked exactly like data obtained from a real cell. And the rest were intriguing: unexpected, but biologically possible. And then I knew we had reached the top of the mountain that towered so high above us years ago. The first computerized model of an entire living thing was launched. What will he teach us?

A peek into the life of a cell

After about a year of implementing the new tool we developed, we still discover fascinating things every time we peer into the inner workings of the virtual microorganism containing millions of details related to life and reproduction. We found, to our astonishment, that proteins push each other out of the DNA surprisingly often, about 30,000 times in each 9-hour life cycle. We also discovered that the stable doubling time of the bacterium is actually a phenomenon resulting from complex interrelationships between two separate stages of replication, the duration of each of which can vary greatly. And the second-by-second monitoring of the cell's behavior allowed us to explain why the cell stops dividing immediately when certain genes are silenced, but divides 10 more times before dying when other essential genes are damaged. The additional division cycles can occur in those cases where the cell stores more copies of the protein encoded by the gene than are necessary for one life cycle. The excess is passed on to the offspring, and these only die when the reserves eventually run out. These initial results are exciting, but it may take years to understand what all the computer simulations are telling us about how these bacteria and cells in general work.

Our research on M. Genitalium is only the first step on the way to develop a computerized model of human cells or tissues at the level of genes and molecules. The model we have today is far from perfect, and mycoplasmas are pretty much the simplest independent life forms that exist. All the simulations, software, knowledge base and experimental data are freely available on the Internet, and we and other researchers are already working to improve the simulator and expand its activity to a variety of organisms, such as E. coli and the yeast Saccharomyces cerevisiae, two organisms most common in research laboratories in academia and industry.

In these strains, the control of genes is much more complex, and the intracellular location where the various events take place is also important. When we manage to address these issues, I predict that the next target will be a mouse cell or a human cell, most likely a cell, such as a macrophage (immune system attack cell), which can be easily grown in culture and used as a source of measurements that will help fine-tune and attack the model.

I cannot guess how far we are today from such technology. Compared to bacteria, human cells have many more compartments and much more extensive genetic control, large parts of which are not known at all. Moreover, since human cells are found in multicellular tissues, there is a much closer relationship between them than between bacteria.

On February 13, 2013, I would say it would be at least a decade before we could develop a model of the simplest cell, and I wouldn't dare think of a model for anything more complex. Now, one can at least think about developing a model for a human cell, if only to see how the software fails and from the failures to understand what are the things that still need to be learned about our cells. Even that would be a pretty big step.

__________________________________________________________________________________________________________________________________________________________________

About the author

Marcus V. Covert is an associate professor of bioengineering at Stanford University, where he heads a systems biology laboratory.

in brief

Computer models that take into account the activity of every gene and every molecule in a cell can revolutionize the way we study, understand and plan biological systems.

A comprehensive computer simulation of a common infectious bacterium was completed last year, and while it still has flaws, it is already making new discoveries.

Scientists are now developing models of more complex organisms. In the long term, the goal is to simulate human cells and organs at a similar level of detail.