Is our data outrunning our science?

You've probably heard the term "big data" — it refers to the enormous databases full of information generated by everything from social media to sensors measuring weather patterns. Computers have ushered in a golden age of data-gathering. The problem is we're not sure how to analyze it yet.

Image by Yuta Onoda

Over at Quanta magazine, there's a fantastic series of articles about the scientists, researchers and engineers who are trying to make sense of big data from the heavens, from medicine, and even from what you're about to type in comments on this post. They're having to figure out new ways to classify information, and new ways to comb through that information with special software programs designed to detect key patterns.

If you're interested in the future of science, this package of stories is a must-read.

In one essay, Natalie Wolchover writes about the data from outer space that could eventually reveal what dark energy really is. Here's the fascinating opening from her article:

Even as he installed the landmark camera that would capture the first convincing evidence of dark energy in the 1990s, Tony Tyson, an experimental cosmologist now at the University of California, Davis, knew it could be better. The camera’s power lay in its ability to collect more data than any other. But digital image sensors and computer processors were progressing so rapidly that the amount of data they could collect and store would soon be limited only by the size of the telescopes delivering light to them, and those were growing too. Confident that engineering trends would hold, Tyson envisioned a telescope project on a truly grand scale, one that could survey hundreds of attributes of billions of cosmological objects as they changed over time.

It would record, Tyson said, “a digital, color movie of the universe.”

Tyson’s vision has come to life as the Large Synoptic Survey Telescope (LSST) project, a joint endeavor of more than 40 research institutions and national laboratories that has been ranked by the National Academy of Sciences as its top priority for the next ground-based astronomical facility. Set on a Chilean mountaintop, and slated for completion by the early 2020s, the 8.4-meter LSST will be equipped with a 3.2-billion-pixel digital camera that will scan 20 billion cosmological objects 800 times apiece over the course of a decade. That will generate well over 100 petabytes of data that anyone in the United States or Chile will be able to peruse at will. Displaying just one of the LSST’s full-sky images would require 1,500 high-definition TV screens.

The LSST epitomizes the new era of big data in physics and astronomy. Less than 20 years ago, Tyson’s cutting-edge digital camera filled 5 gigabytes of disk space per night with revelatory information about the cosmos. When the LSST begins its work, it will collect that amount every few seconds — literally more data than scientists know what to do with.

Tony Tyson, an experimental cosmologist at the University of California, Davis, with a small test camera for the Large Synoptic Survey Telescope project, which he is helping to launch.

“The data volumes we [will get] out of LSST are so large that the limitation on our ability to do science isn’t the ability to collect the data, it’s the ability to understand the systematic uncertainties in the data,” said Andrew Connolly, an astronomer at the University of Washington.

Typical of today’s costly scientific endeavors, hundreds of scientists from different fields are involved in designing and developing the LSST, with Tyson as chief scientist. “It’s sort of like a federation,” said Kirk Borne, an astrophysicist and data scientist at George Mason University. The group is comprised of nearly 700 astronomers, cosmologists, physicists, engineers and data scientists.

Much of the scientists’ time and about one-half of the $1 billion cost of the project are being spent on developing software rather than hardware, reflecting the exponential growth of data since the astronomy projects of the 1990s. For the telescope to be useful, the scientists must answer a single question. As Borne put it: “How do you turn petabytes of data into scientific knowledge?”

Physics has been grappling with huge databases longer than any other field of science because of its reliance on high-energy machines and enormous telescopes to probe beyond the known laws of nature. This has given researchers a steady succession of models upon which to structure and organize each next big project, in addition to providing a starter kit of computational tools that must be modified for use with ever larger and more complex data sets.

Even backed by this tradition, the LSST tests the limits of scientists’ data-handling abilities. It will be capable of tracking the effects of dark energy, which is thought to make up a whopping 68 percent of the total contents of the universe, and mapping the distribution of dark matter, an invisible substance that accounts for an additional 27 percent. And the telescope will cast such a wide and deep net that scientists say it is bound to snag unforeseen objects and phenomena too. But many of the tools for disentangling them from the rest of the data don’t yet exist.

Read the rest at Quanta, and be sure to check out the whole set of articles.