How database searches solved a 50-year-old medical mystery

Calcium is everywhere in your body, not just your bones and teeth. In fact, this mineral is crucial to countless biological processes, from regulating your hormones to aiding muscle function. And when you don't have enough of it, the results can be disastrous.

And yet, until now, nobody was sure how cells regulated calcium in our bodies. Finding out could help doctors treat many disorders, including heart attacks and strokes. At last, when a research team untangled the calcium's mystery, it wasn't just from experimenting with beakers in a wet lab. Instead, it came from researchers doing computer searches in biological databases.

Calcium is stored in just about every cell in your body, and the levels of this crucial mineral are regulated by tiny organs inside each cell called mitochondria. Some cells contain just one mitochondrion, while others can contain thousands. Either way, the calcium-regulating role of mitochondria, and the host of complications associated with their malfunction, has been recognized by scientists for decades.

The problem was that nobody could identify the key molecular components of mitochondria's calcium machinery. Now, results published in the most recent issue of Nature show how one of these proteins has finally been unmasked – not with pipettes and beakers, but a mouse and a keyboard.

How database searches solved a 50-year-old medical mystery


Dr. Vamsi Mootha, an associate professor of systems biology at Harvard Medical School, worked with a team who discovered the key protein behind mitochondria's calcium machinery. They call the protein, quite simply, mitochondrial calcium uniporter, or MCU for short.

What's really interesting is how the Mootha Lab did it. Sure, there was plenty of traditional wet lab work involving pipettes, beakers, and chemical reagents – but techniques like these had failed to identify MCU for close to fifty years. The difference was the Mootha lab's strategic and creative searching of publicly-accessible biological databases.

How database searches solved a 50-year-old medical mystery

The database of life

So what are biological databases? Biological databases are like libraries for the building blocks of life, housing vast quantities of information on everything from genes to proteins to evolutionary relationships. Here's what Dr. Mootha had to say when we asked him about these large-scale datasets:

The human genome was sequenced in draft form and released to the public some 10 years ago. Its 20,000 protein encoding genes encode RNA molecules that in turn encode proteins. Today, we have this sequence, as well as the genome sequence of over 1000 additional organisms, ranging from worms and flies to fungus and bacteria. We can use computer programs to compare these genome sequences to identify proteins that distinguish a fly from a human. On top of that, new technologies, such as "microarrays" and "proteomics" have made it possible to take snapshots of the genome in action. These technologies tell us which of the 20,000 genes are turned on or off in a tumor; which genes are turned on during aging; which genes are expressed in muscle but not in the heart; or which genes are stimulated in response to exercise. There are literally thousands of such molecular snapshots of the genome's activity in different contexts.

According to Mootha, many of the research approaches undertaken by his lab are fueled by well-established biological datasets, but the story of Mootha's search for MCU actually begins with the creation of his own database: a comprehensive inventory of the proteins in human and mouse mitochondria, dubbed "MitoCarta." He said:

The motivation for creating MitoCarta was simple: to discover human disease genes and to advance fundamental cell biology. We spent many years developing the experimental and computational methods that led up to MitoCarta. It serves as a "parts-list" for this organelle — it's the starting point for many of our investigations today.

Mootha's lab published the MitoCarta in 2008, and in 2010 revealed how the lab had cross-referenced the MitoCarta with other databases to identify 50 proteins (out of the roughly 1100 proteins compiled in the MitoCarta) that might be involved in calcium channeling. By examining these 50 protein candidates, the lab singled out the very first protein specifically identified as required for mitochondrial calcium uptake, a protein they called "MICU1." The discovery of MICU1 was an enormous step towards the discovery of MCU.

"We showed that MICU1 was required for calcium uptake, but because it did not span the membrane, we doubted it was the central component of the [calcium] channel. But what it provided us with was live bait to then go and find the bigger fish," said Mootha.

Protein bait

In science, the immediate implications of one's experimental findings are not always readily apparent. Sometimes it takes someone asking the right questions, or looking at the results from a different angle, to tease groundbreaking information out of a set of findings. The "live bait" of MICU1 provided graduate student Joshua Baughman and postdoctoral researcher Fabiana Perocchi the new perspective they needed to probe several datasets and seek out other proteins that, like MICU1, play a role in mitochondrial calcium uptake. This time, however, they were hoping to find the protein responsible for providing safe passage of the calcium into the mitochondria.

Using the known characteristics of MICU1 as "bait," Baughman and Perocchi performed 3 analyses based on MICU1's evolutionary, genetic, and proteomic characteristics, each time searching enormous biological databases for proteins that shared its biological function of calcium uptake.

At the end of each individual analysis, the researchers' results pointed to a specific unstudied protein of unknown function. When the results of the analyses were combined, it was clear that no other protein came close in terms of its potential functional relationship to MICU1. These combined results all but demanded the protein be examined in greater detail. By knocking out the gene for this protein in cell cultures and live animals, the researchers hoped they could prevent it from doing its job, and they hypothesized that its function would be related somehow to mitochondrial calcium uptake. Observation proved that their powers of deduction had led them to the right protein; when the protein was inactivated, the mitochondria, both in culture and in live animals, lost their capacity to absorb calcium. That unstudied protein of unknown function was first known as CCDC109, but you, along with the rest of the world, now know it as MCU.

What comes next?

Speaking to the implications of these findings, Dr. Mootha had this to say:

Mitochondrial calcium uptake has been studied for decades and is believed to be important for a variety of diseases, ranging from heart attack and stroke to cancer. But nearly all research to date have been correlative in nature — since we haven't had means to perturb this pathway in a clean way. Knowing the genes will allow us to genetically manipulate them to rigorously evaluate their contribution to physiology. Knowing the genes will also allow us to identify diseases in which they are mutated.

Identification of these proteins represents a starting point of a new direction in our lab. We're very interested in understanding, at a molecular level, how these two proteins cooperate to gate calcium. We hope that such insights may allow us to one day develop drugs that target this pathway.

It's not always easy to recognize how incomplete scientific information might later be put to use. Just look at "unstudied-protein-of-unknown-function" CCDC109. Equally difficult to appreciate is the latent power of accumulated scientific data. These are two of the biggest reasons that the research process is so meticulously catalogued, be it in the form of research proposals, lab notebooks, scientific journals, or – most recently – open source biological databases. As I mentioned earlier, the sheer size of public biological databases sometimes obscures the significance of the information they contain, but I think Mootha's description of the impact of the MitoCarta since its introduction in 2008 helps bring the scientific importance of these databases into focus:

About half of the ~1100 proteins [of the MitoCarta] have no known function. We are now developing computational methods to define their function and to understand how they assemble together to produce a functioning organelle.

We continue to mine MitoCarta. As useful as it's been for us, its real impact has come from the fact that we made it freely available to the public — it's been cited in more than 225 publications in the last three years.

Visit the Mootha Lab
Research via Nature
Top Image via Shutterstock