Reproducibility is a major principle of the scientific method. It means that a result obtained by an experiment or observational study should be achieved again with a high degree of agreement when the study is replicated with the same methodology by different researchers. Only after one or several such successful replications should a result be recognized as scientific knowledge.
With a narrower scope, reproducibility has been introduced in computational sciences: Any results should be documented by making all data and code available in such a way that the computations can be executed again with idenitical results.
The terms replicability and repeatability are used in the context of reproducibility, see below.
In recent decades, there has been a rising concern that many published scientific results fail the test of reproducibility, evoking a reproducibility or replicability crisis.
The first to stress the importance of reproducibility in science was the Irish chemist Robert Boyle, in England in the 17th century. Boyle's air pump was designed to generate and study vacuum, which at the time was a very controversial concept. Indeed, distinguished philosophers such as René Descartes and Thomas Hobbes denied the very possibility of vacuum existence. Historians of science Steven Shapin and Simon Schaffer, in their 1985 book Leviathan and the Air-Pump, describe the debate between Boyle and Hobbes, ostensibly over the nature of vacuum, as fundamentally an argument about how useful knowledge should be gained. Boyle, a pioneer of the experimental method, maintained that the foundations of knowledge should be constituted by experimentally produced facts, which can be made believable to a scientific community by their reproducibility. By repeating the same experiment over and over again, Boyle argued, the certainty of fact will emerge.
The air pump, which in the 17th century was a complicated and expensive apparatus to build, also led to one of the first documented disputes over the reproducibility of a particular scientific phenomenon. In the 1660s, the Dutch scientist Christiaan Huygens built his own air pump in Amsterdam, the first one outside the direct management of Boyle and his assistant at the time Robert Hooke. Huygens reported an effect he termed "anomalous suspension", in which water appeared to levitate in a glass jar inside his air pump (in fact suspended over an air bubble), but Boyle and Hooke could not replicate this phenomenon in their own pumps. As Shapin and Schaffer describe, "it became clear that unless the phenomenon could be produced in England with one of the two pumps available, then no one in England would accept the claims Huygens had made, or his competence in working the pump". Huygens was finally invited to England in 1663, and under his personal guidance Hooke was able to replicate anomalous suspension of water. Following this Huygens was elected a Foreign Member of the Royal Society. However, Shapin and Schaffer also note that "the accomplishment of replication was dependent on contingent acts of judgment. One cannot write down a formula saying when replication was or was not achieved".
The philosopher of science Karl Popper noted briefly in his famous 1934 book The Logic of Scientific Discovery that "non-reproducible single occurrences are of no significance to science". The statistician Ronald Fisher wrote in his 1935 book The Design of Experiments, which set the foundations for the modern scientific practice of hypothesis testing and statistical significance, that "we may say that a phenomenon is experimentally demonstrable when we know how to conduct an experiment which will rarely fail to give us statistically significant results". Such assertions express a common dogma in modern science that reproducibility is a necessary condition (although not necessarily sufficient) for establishing a scientific fact, and in practice for establishing scientific authority in any field of knowledge. However, as noted above by Shapin and Schaffer, this dogma is not well-formulated quantitatively, such as statistical significance for instance, and therefore it is not explicitly established how many times must a fact be replicated to be considered reproducible.
Two major steps are naturally distinguished in connection with reproducibility of experimental or observational studies: When new data is obtained in the attempt to achieve it, the term replicability is often used, and the new study is a replication or replicate of the original one. Obtaining the same results when analyzing the data set of the original study again with the same procedures, many authors use the term reproducibility in a narrow, technical sense coming from its use in computational research. Repeatability is related to the repetition of the experiment within the same study by the same researchers. Reproducibility in the original, wide sense is only acknowledged if a replication performed by an independent researcher team is successful.
In chemistry, the terms reproducibility and repeatability are used with a specific quantitative meaning: In inter-laboratory experiments, a concentration or other quantity of a chemical substance is measured repeatedly in different laboratories to assess the variability of the measurements. Then, the standard deviation of the difference between two values obtained within the same laboratory is called repeatability. The standard deviation for the difference between two measurement from different laboratories is called reproducibility. These measures are related to the more general concept of variance components in metrology.
The term reproducible research refers to the idea that scientific results should be documented in such a way that their deduction is fully tranparent. This requires a detailed description of the methods used to obtain the data and making the full dataset and the code to calculate the results easily accessible. This is the essential part of open science.
There are systems that facilitate such documentation, like the R Markdown language or the Jupyter notebook. The Open Science Framework provides a platform and useful tools to support reproducible research.
Psychology has seen a renewal of internal concerns about irreproducible results (see the entry on replicability crisis for empirical results on success rates of replications). Researchers showed in a 2006 study that, of 141 authors of a publication from the American Psychology Association (APA) empirical articles, 103 (73%) did not respond with their data over a six-month period. In a follow up study published in 2015, it was found that 246 out of 394 contacted authors of papers in APA journals did not share their data upon request (62%). In a 2012 paper, it was suggested that researchers should publish data along with their works, and a dataset was released alongside as a demonstration. In 2017, an article published in Scientific Data suggested that this may not be sufficient and that the whole analysis context should be disclosed.
There have been initiatives to improve reporting and hence reproducibility in the medical literature for many years, beginning with the CONSORT initiative, which is now part of a wider initiative, the EQUATOR Network. This group has recently turned its attention to how better reporting might reduce waste in research, especially biomedical research.
Reproducible research is key to new discoveries in pharmacology. A Phase I discovery will be followed by Phase II reproductions as a drug develops towards commercial production. In recent decades Phase II success has fallen from 28% to 18%. A 2011 study found that 65% of medical studies were inconsistent when re-tested, and only 6% were completely reproducible.
Hideyo Noguchi became famous for correctly identifying the bacterial agent of syphilis, but also claimed that he could culture this agent in his laboratory. Nobody else has been able to produce this latter result.
In March 1989, University of Utah chemists Stanley Pons and Martin Fleischmann reported the production of excess heat that could only be explained by a nuclear process ("cold fusion"). The report was astounding given the simplicity of the equipment: it was essentially an electrolysis cell containing heavy water and a palladium cathode which rapidly absorbed the deuterium produced during electrolysis. The news media reported on the experiments widely, and it was a front-page item on many newspapers around the world (see science by press conference). Over the next several months others tried to replicate the experiment, but were unsuccessful.
Nikola Tesla claimed as early as 1899 to have used a high frequency current to light gas-filled lamps from over 25 miles (40 km) away without using wires. In 1904 he built Wardenclyffe Tower on Long Island to demonstrate means to send and receive power without connecting wires. The facility was never fully operational and was not completed due to economic problems, so no attempt to reproduce his first result was ever carried out.
Other examples which contrary evidence has refuted the original claim: