The director of the Center for Open Science speaks with Templeton.org about the “reproducibility crisis” and his organization’s plans to help scientific research stay true to its values.
In late August, subscribers to the email list of the Association for Psychological Science got the latest roundup of studies published in the field’s preeminent empirical journal. Every one of the studies had a badge next to it indicating “Open Data,” meaning that the researchers were making available all of the underlying data to others to examine and probe. The Open Data badge is just one initiative from the Center for Open Science, which received a major grant from the John Templeton Foundation in 2014. We spoke with Brian Nosek, co-founder and director of the center as well as a professor of psychology at the University of Virginia, about the prospects for a more reliable scientific method.
JTF: How did contemporary scientific research arrive at the present-day “crisis of reproducibility”?
Nosek: When you survey researchers on scientific values like transparency and reproducibility of research, over 90 percent of people say they embrace those values. And when you ask them how they behave people mostly say that they behave according the norms of science. But when you ask, “How do other people in your field behave?” they say, “Oh, most people don’t behave like that.”
How do I succeed in present-day science? By publishing as frequently as I can in the most prestigious places that I can. And so I have to do the things that help me get published. That’s the currency of advancement: a positive result is more publishable than a negative result; a novel result is more publishable than repeating what someone else did; and a clean, tidy story is more publishable than stuff with exceptions.
But to get a positive, novel, clean story is very hard.
Why are so many studies turning out to not be reproducible?
In some cases, it’s that the initial result was just wrong. The researcher observed [something] by chance or they “p-hacked” it — using flexibility in their analysis in order to get to a result that looked like it was there but it really wasn’t. Other times what happens is that the replicators got it wrong. So when we’re doing replications, we work very hard to minimize that possibility.
There are two more areas that are the big challenges for reproducibility: one is simply a communication problem. One of the big barriers is just figuring out what the original authors did. The methodology is supposedly described in the paper, but what we find is it’s almost impossible to find out what they actually did in many studies.
The last category of non-reproducible studies is the most theoretically interesting, but the hardest to unpack: It’s possible that there are subtle differences between the two [versions of the study] that really are important for understanding that phenomenon, when it will occur and when it won’t, and the replication just sort of stumbled into a condition where it’s not observed.
But what happens too easily in the field is that, people see a difference between an original study and replication, generate a hypothesis for why it might have happened, and then assume that’s actually the explanation.
What’s the role that cognitive bias seems to play in the crisis?
Science, like every other human activity, is full of interpretation. We’re constantly looking at data, reasoning about it, trying to come up with what we think will be true. And even when that’s done genuinely with the best of intentions, I have my previous expectations about how I think the world works, theories I’ve had, beliefs I have, my ideology, and that may color how I evaluate that evidence. I may have confirmation bias: I may look for things that are consistent with my prior beliefs and be more likely to omit things that are less consistent with my prior beliefs. Another big challenge is hindsight bias: when I analyze the data and it doesn’t look like what I expected, but as I think about it, something sort of starts to look like, “Oh, yeah, of course it would have come out like that.”
What we need to do if we want to try to eliminate those biases is put ourselves under constraint. This is really the main reason for “preregistration” as a practice. If I want to test a particular hypothesis, I should make my commitments before seeing the data.
How did you go from identifying the problem to coming up with the Open Science Framework?
We felt like what we needed to do is provide the what, the how, and the why to behave according to the norms of science—openness, transparency, reproducibility.
The what we do is training: Give a researcher knowledge about how to be more reproducible in their practices, how to make their data more open, how to make it more usable by others.
The how is the Open Science Framework: Now I know what to do, where do I do it? Here’s a tool that will help me to preregister my designs, that will make it easy for me to share my data, and so forth.
The why is our work with communities to nudge those incentives, to try to give people rewards for the values they have, so that what’s good for me and what’s good for science are the same thing.
How is the why playing out? What are you finding is most effective for that?
We have four initiatives to try to nudge incentives, and each are having some degree of success. The most successful so far are the eight Transparency and Openness Promotion (TOP) guidelines. This came from a community-based effort to define open practices: What does it mean to share data? What are the criteria for having an open data set or preregistering a study? And by defining the guidelines, we made it very easy for journals to adopt it as part of their policy, and for funders to ask for it from their grantees. So far the TOP Guidelines have more than 5,000 journals and 80 other organizations as signatories, committed to going through the process of deciding what to integrate into their policies.
The second intervention is a set of Open Science badges to acknowledge open practices. This is a very simple intervention to signal when studies incorporate preregistration, open data, and open materials. They appear right on the paper itself with links to the data. So far 18 journals have adopted those, and the initial evidence is promising.
The third is called Registered Reports: integrating the concept of pre-registration into the academic publication rewards system. Registered reports involves undergoing peer review before you know the outcomes of your research. You submit the research design and questions to the journals, and if it passes that first stage of review, you get an in-principle acceptance. Regardless of what you find in the research, as long as you show in the second-stage review that you did the study well, the outcome gets published. So far we have 70 journals so far have adopted registered reports and we’re getting a lot of experience for studies to evaluate whether it’s effective.
The last element is a $1 Million Preregistration Challenge. We are giving away $1,000 to 1,000 researchers who pre-register their projects through our guided workflow and then publish the outcomes of that research. The whole goal of that is to raise awareness about preregistration and ideally inculcate it is a habit.
This sounds very much like a positive-reinforcement approach to changing the way research is done.
I confess my default is to be positive about everything, and so I much prefer the carrot approach than the stick. Moreover, we aren’t really in a position to use the stick. Big funders like the NIH and universities can apply a stick because they have control over the outcomes of researchers. The risks of stick approaches are obvious — people comply but with resentment rather than thinking that it’s a good thing to do. And so the quality doesn’t follow. With the carrot approach are much more likely to get buy-in because people see that they can use these to achieve their own values. They can live their values by doing these practices.
What have you learned from applying your own methods on your own roll-out of the Open Science Framework?
It’s our responsibility to model the kinds of behaviors that we are suggesting will improve reproducibility, so my lab has moved to preregistering everything in the lab and here at the center. For all of our research projects, we make the data publicly available, including the entire workflow, to the extent that we can.
For example, one possibility is that registered reports might improve the reproducibility of the work that’s done, but at the cost of making researchers much more conservative and less creative in the kinds of questions that they want to test. So we need to evaluate that, because if we end up making research reproducible, but for totally uninteresting questions, then it may be producing a net loss rather than gain in the progress of science. So we are trying to make sure that we build in and design evaluations for all our interventions. We can only advance our mission by being clear-eyed and open to the errors that we will inevitably make, so that we can make sure that our research is as robust as possible and the interventions that follow from it are as evidence-based as possible.