False-Positive Psychology

Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant

Joseph P. Simmons, Leif D. Nelson, Uri Simonsohn

Publikationsdatum: 2011

Diese Seite wurde seit mehr als 7 Monaten inhaltlich nicht mehr aktualisiert. Unter Umständen ist sie nicht mehr aktuell.

Zusammenfassungen

In this article, we accomplish two things. First, we show that despite empirical psychologists’ nominal endorsement of a low rate of false-positive findings (≤ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers, all of which impose a minimal burden on the publication process.

Von Joseph P. Simmons, Leif D. Nelson, Uri Simonsohn im Text False-Positive Psychology (2011)

Bemerkungen

The paper makes a compelling case. If p-hacking can reverse the flow of time, what can it not do?

Von Carl T. Bergstrom, Jevin D. West im Buch Calling Bullshit (2020) im Text The Susceptibility of Science

Im Artikel wird zwar der Begriff "p-hacking" nicht erwähnt, es geht aber genau um dieses Phänomen.

Von Beat Döbeli Honegger, erfasst im Biblionetz am 10.07.2022

To illustrate how powerful p-hacking techniques can be, Joseph Simmons and colleagues Leif Nelson and Uri Simonsohn tested a pair of hypotheses they were pretty sure were untrue. One was an unlikely hypothesis; the other was impossible.
The unlikely hypothesis was that listening to children’s music makes people feel older than they really are. Volunteers listened to either a children’s song or a control song, and later were asked how old they felt. With a bit of p-hacking, the researchers concluded that listening to a children’s song makes people feel older, with statistical significance at the p < 0.05 level.
While suggestive, the initial study was not the most persuasive demonstration of how p-hacking can mislead. Maybe listening to a children’s song really does make you feel old. So the authors raised the bar and tested a hypothesis that couldn’t possibly be true. They hypothesized that listening to the classic Beatles song “When I’m Sixty-Four” doesn’t just make people feel younger, it literally makes them younger. Obviously this is ridiculous, but they conducted a scientific experiment testing it anyway. They ran a randomized controlled trial in which they had each subject listen either to the Beatles song or to a control song. Remarkably, they found that while people who listened to each song should have been the same age, people who heard “When I’m Sixty-Four” were, on average, a year and a half younger than people who heard the control. Moreover, this difference was significant at the p < 0.05 level! Because the study was a randomized controlled trial, the usual inference would be that the treatment—listening to the song—had a causal effect on age. Thus the researchers could claim (albeit tongue in cheek) to have evidence that listening to “When I’m Sixty-Four” actually makes people younger. To reach these impossible conclusions, the researchers deliberately p-hacked their study in multiple ways. They collected information about a number of characteristics of their study subjects, and then controlled for the one that happened to give them the result they were looking at. (It was the age of the subject’s father, for what that’s worth.) They also continued the experiment until they got a significant result, rather than predetermining the sample size. But such decisions would be hidden in a scientific report if the authors chose to do so. They could simply list the final sample size without acknowledging that it was not set in advance, and they could report controlling for the father’s age without acknowledging that they had also collected several additional pieces of personal information, which they ended up discarding because they did not give the desired result.

Von Carl T. Bergstrom, Jevin D. West im Buch Calling Bullshit (2020) im Text The Susceptibility of Science

Dieser wissenschaftliche Zeitschriftenartikel erwähnt ...

Personen
KB IB clear

John P. A. Ioannidis

Begriffe
KB IB clear

Daten

data ,

false positive rate , p-hacking ,

Psychologie

psychology , Signifikanz , Simulation ,

Statistik

statistics

Texte

Jahr		Umschlag	Titel	Abrufe	IB	OB	KB	LB
2005			Why Most Published Research Findings Are False (John P. A. Ioannidis)	1, 1, 3, 2, 3, 6, 3, 7, 4, 3, 10, 12	8	10	12	448

Zitationsgraph

Diese SVG-Grafik fensterfüllend anzeigen

Zitationsgraph (Beta-Test mit vis.js)

3 Erwähnungen

All Data Are Local - Thinking Critically In A Data-Driven Society (Yanni Alexander Loukissas) (2019)
Calling Bullshit - The Art of Skepticism in a Data-Driven World (Carl T. Bergstrom, Jevin D. West) (2020)
- 9. The Susceptibility of Science
Prompting Considered Harmful (Meredith Ringel Morris) (2024)

Volltext dieses Dokuments

False-Positive Psychology: Artikel als Volltext ( lokal

, 420 kByte; WWW

)

Anderswo suchen

Beat und dieser wissenschaftliche Zeitschriftenartikel

Beat hat Dieser wissenschaftliche Zeitschriftenartikel während seiner Zeit am Institut für Medien und Schule (IMS) ins Biblionetz aufgenommen. Beat besitzt kein physisches, aber ein digitales Exemplar. Eine digitale Version ist auf dem Internet verfügbar (s.o.). Es gibt bisher nur wenige Objekte im Biblionetz, die dieses Werk zitieren. Beat selbst sagt, er habe dieses Dokument überflogen.

Beats Biblionetz - Texte