Massive Collaboration Testing Reproducibility of Psychology Studies Publishes Findings

Brian Nosek, Professor of Psychology and Executive Director, Center for Open Sciences

A study that sought to replicate 100 findings published in three prominent psychology journals has found that, across multiple criteria, independent researchers could replicate less than half of the original findings. In some cases this may call into question the validity of some scientific findings, but it may also point to the difficulty of conducting effective replications and achieving reproducible results.

The results of this review study, conducted by more than 270 researchers on five continents, are published in the Aug. 28 issue of the journal Science. Twenty-two students and faculty from the University of Virginia were among the co-authors.

Brian Nosek, Professor of Psychology and Executive Director, Center for Open Sciences

“For years there has been concern about the reproducibility of scientific findings, but little direct, systematic evidence. This project is the first of its kind and adds substantial evidence that the concerns are real and addressable,” said Brian Nosek, a U.Va. psychology professor and coordinator of the study.

Nosek is the co-founder and executive director of the Center for Open Science, which coordinated the Reproducibility Project: Psychology. The project has produced the most comprehensive, open investigation ever about the rate and predictors of reproducibility in a field of science.

Reproducibility means that the results recur when the same data are analyzed again, or when new data are collected using the same methods.

“With this project we established an initial estimate of the rate of reproducibility in psychology, and identified some evidence of possible influences on reproducibility,” said Anup Gampa, a Reproducibility Project team member and Ph.D. candidate at U.Va. “This sets the stage for new research to examine how to improve reproducibility.”

Science is unique from other ways of gaining knowledge, Gampa said, because it relies on reproducibility to gain confidence in ideas and evidence.

“Scientific evidence does not rely on trusting the authority of the person who made the discovery,” said Reproducibility Project team member Angela Attwood, a psychology professor at the University of Bristol. “Rather, credibility accumulates through independent replication and elaboration of the ideas and evidence.”

However, Elizabeth Gilbert, a Reproducibility Project team member and Ph.D. candidate at U.Va., noted that a failure to reproduce does not necessarily mean the original report was incorrect.

“A replication team must have a complete understanding of the methodology used for the original research, and shifts in the context or conditions of the research could be unrecognized but important for observing the result,” she said.

Nosek pointed out that a problem for psychology, as well as in other disciplines, is that incentives for scientists are not consistently aligned with reproducibility.

“Scientists aim to contribute reliable knowledge, but also need to produce results that help them keep their job as a researcher,” he said. “To thrive in science, researchers need to earn publications, and some kinds of results are easier to publish than others, particularly ones that are novel and show unexpected or exciting new directions.”

As a consequence, according to Nosek and his co-authors, many scientists pursue innovative research in the interest of their careers, even at the cost of reproducibility of the findings. The authors say that research with new, surprising findings is more likely to be published than research examining when, why or how existing findings can be reproduced.

Overall, the Reproducibility Project team successfully replicated fewer than half of the original findings. Investigators suggested this could be due to three basic reasons:

Though most replication teams worked with the original authors to use the same materials and methods, small differences in when, where or how the replication was carried out might have influenced the results.
The replication might have failed, by chance, to detect the original result.
The original result might have been a false positive.

“The findings demonstrate that reproducing original results may be more difficult than is presently assumed, and interventions may be needed to improve reproducibility,” said Johanna Cohoon, a project coordinator with the Charlottesville-based Center for Open Science.

In keeping with the goals of openness and reproducibility, each replication project team posted its methods and results on a public website.

Many organizations, funders, journals and publishers are working to improve reproducibility. The journal Psychological Science, one of the publications included in this study, last year implemented practices to make study materials and data readily and openly available to other researchers.

“Efforts include increasing transparency of original research materials, code and data so that other teams can more accurately assess, replicate and extend the original research, and pre-registration of research designs to increase the robustness of the inferences drawn from the statistical analyses applied to research results,” said Denny Borsboom, a project team member from the University of Amsterdam who was involved in the creation of the Transparency and Openness Promotion Guidelines, recently published in Science.