Tuesday, January 13, 2009

Extrapolating from data

Over at cosmic variance, unsolicited advice for post-docs suggests that it is a good idea to make papers good, which he defines as “interesting, even to people outside your immediate circle of friends.” However, in analytical science, broadening the appeal of a result often increases the chance that it is wrong.

An example. Suppose I discover a 4.3 billion year old rock. The oldest currently known rock is 4.2 billion years old, and the Earth is 4.5 billion years old. So my new rock is 33% closer to the beginning than the previous oldest rock was.

I will use endmember examples to describe how to interpret my new discovery. The conservative approach is to say that everything I know about my rock can only be applied to that particular rock, and says nothing about the rock 10 meters away, the rest of the tectonic block, or the Earth as a whole. The speculative approach is to say that my rock is representative of all 200 million-year-old earth-sized planets around all mid-sized stars in the universe.

Now, the latter explanation will obviously interest a larger number of people- astrobiologists, astronomers, meteoriticists, and all sorts of geochemists would like to know about early planetary conditions. The former approach would interest only the thre guys who found that particular rock. So by the CV definition listed above, the speculative approach would be a better interpretation. But the problem is that extrapolation to the entire universe is also more likely to be wrong.

How far to extrapolate one’s results is one of those touchy subjects in science. Different people draw the line in different places. Compared to any one person’s preferred amount of extrapolation, anyone who extrapolates more is an irresponsible yahoo who is putting disinformation into the literature and leading generations astray. Anyone who extrapolates less is a curmudgeonly cherry-picking data hoarder who refuses to publish all but his least interesting work.

The key to navigating the treacherous waters of data interpretation is to know that different researchers- and different departments- have different standards by which they judge the correct amount of extrapolation.

One high profile example of this is in the field of Hadean zircons, which were discussed in the New York Times article I blogged about last month. The oxygen isotope work on these samples was originally done by two groups, one at UCLA and one at UW-Madison. In general, the UW group was conservative about their results, while the UCLA group was more speculative. So at conferences and seminars, proponents of the conservative approach would imply that the UCLA results included bad data, while the more optimistic scientists suggested that the UW-Madison group was committing selection bias by throwing out good data.

If y’all want to argue about who is right in comments, feel free. The important point for scientists in training to realize is that when you are writing your own papers, the amount of extrapolation you do will be criticized as either too much or too little depending on who reads it. So if you’re aiming to work in a conservative department, writing speculative papers probably won’t endear you. And vice versa.


CJR said...

Surely it's more an issue of just being clear about when you're drawing firm conclusions, and when you're speculating? It's the difference between saying, "Our data are consistent with x" and "Our data definitively establish x".

In your example, geologists wanting to know about the Hadean Earth are currently basically limited to studying one or two outcrops; in order to push the discussion forward, I think a degree of speculation is justified - possibly more than would be appropriate in areas where we have more information. But it does need to be clearly labelled as such.

Chuck said...

Well, yes and no. One of the means by which one extrapolates from data to worlds is via models of global processes. And different folks disagree on which processes may or may not be valid for various periods of Earth history. For example, if you claim that your Archean rock is typical of Archean subduction zones worldwide, then people will judge your conclusions based on their predilection or distaste for Archean subduction...