A masterful introduction to ENCODE, one of the most impressive genome projects of our timeS

What is ENCODE? "ENCODE is vast," writes science writer Ed Yong towards the end of this massively comprehensive (albeit characteristically lucid) introduction to this ambitious international genome project; and that it is.

Known formally as the "Encyclopedia of DNA Elements," ENCODE today published 30 papers across three different scientific journals, all with the aim of, as Yong puts it, moving us from "here's the genome" (the maxim, if you will, of the Human Genome Project) toward "here's what the genome does."

Yong continues:

Over the last 10 years, an international team of 442 scientists have assailed 147 different types of cells with 24 types of experiments. Their goal: catalogue every letter (nucleotide) within the genome that does something.

For years, we've known that only 1.5 percent of the genome actually contains instructions for making proteins, the molecular workhorses of our cells. But ENCODE has shown that the rest of the genome – the non-coding majority – is still rife with "functional elements". That is, it's doing something.

It contains docking sites where proteins can stick and switch genes on or off. Or it is read and ‘transcribed' into molecules of RNA. Or it controls whether nearby genes are transcribed (promoters; more than 70,000 of these). Or it influences the activity of other genes, sometimes across great distances (enhancers; more than 400,000 of these). Or it affects how DNA is folded and packaged. Something.

According to ENCODE's analysis, 80 percent of the genome has a "biochemical function". More on exactly what this means later, but the key point is: It's not "junk". Scientists have long recognised that some non-coding DNA probably has a function, and many solid examples have recently come to light. But, many maintained that much of these sequences were, indeed, junk. ENCODE says otherwise. "Almost every nucleotide is associated with a function of some sort or another, and we now know where they are, what binds to them, what their associations are, and more," says Tom Gingeras, one of the study's many senior scientists.

You'll find the rest of Yong's overview (a surprisingly navigable 3300-word behemoth of a blog entry) of the ENCODE project and its newly published batch of data over at Discover Magazine.