Bode is an epidemiologist at the CDC’s Florida office. He is investigating a rash of unexplainable deaths in the area. Otherwise healthy individuals are keeling over and expiring from what can only be described as “all purpose mortality.” The only thing they have in common is that they watched the season finale of America’s Got Talent, in which Chuggo Johnson won the contest by showing off his ability to chug copious amounts of various liquids. This season’s sponsor, Sunny D, donated 14 gallons of orange juice for Chuggo to…chug. Their sales swelled immediately after the airing of Chuggo’s heroics.
Kinsey, a moderately committed Presbyterian attending university in a mid-sized Midwestern (USA, WEIRDos) city, is walking to her class one day. She stops to admire a poster advertising a new exhibit at the campus art museum. Evidently it will feature Rodin’s famous sculpture The Thinker. Kinsey spends several minutes noting the exhibition dates and analyzing the image of The Thinker on the poster. Little does she realize, the poster has prodded her to think more analytically and, as she does so, her religious belief begins to erode. She strolls on, noticeably less religious than she was before the poster (by ~.6 standard deviations, that most useless of units). No big deal, right? Well, it turns out that religious beliefs are linked to sense of meaning, ability to resist temptations, health and wellbeing, and all sorts of other goodies. And Kinsey is not the only victim of this spiritually-eradicating poster. Although the effect is relatively modest in size (on par with a gender difference in weight), it is noticeable in aggregate. The well-meaning art museum and their well-meaning poster have doomed a number of students to sputter in a spiritual malaise, eventually dropping out of school. Their peers view them with moral contempt. Their lives are ruined.
Tyler is a fellow student in the same town. After lacrosse practice, he and his friends go to the store to pick up groceries. They are weary from their workout and really just need to fuel up. Tyler enters the store. To his left is a shelf loaded with candy bars. Behind that, the smell of fresh cookies wafts from the bakery. Sorely tempted to indulge, Tyler and his buddies nonetheless push through to the vegetable aisle to procure the healthy vittles their coach recommended. Tofu and crap like that. But they have been temporarily scarred. Their already limited stores of “self-control energy” have been further depleted in the fight against quick sugary calories and their pursuit of vegetables (by an absolutely whopping 1.8 standard deviations!?!). As a result, they have little willpower left, so when they encounter a line at the till, they are unable to restrain their basest violent urges (~.7 SD there). A fight breaks out. This is the third time this week that near-riot conditions have prevailed between the veggies and the bakery.
WHAT THE SHIZZ
As far as I know, none of these stories are remotely true. Although a recent report states that orange juice is associated with a 24% increase in all-purpose mortality, I don’t know of any orange-juice related death waves. Although a fancy Science paper seemed to (sketchily) show that The Thinker killed belief in God, I don’t know of any museum-related deconversion cascades. Although hundreds of studies have been published on ego depletion—including the classic original cookies and radishes study, d = 1.8 and all—and separate research links poor self-control to aggression, I don’t know of any depletion-caused grocery brawls.
Each of these three vignettes presents an example of a domain in which scientific research estimates an effect of a given magnitude. In two of these examples, the effect sizes are given in the form of a standardized metric (Cohen’s d), which is a bit abstract. But we can compare to benchmarks to see how evident such effects would be in the real world. Are you pretty sure that men on average weigh more than women? Cool, you’ve got a sense for d ~ .67. You can damn well see those effects. So when you hear that statues breed atheists (d = .59), radishes kill willpower (d = 1.8), or other findings, you should be able to actually see these effects “in the wild.” And, given effect size aggregation across people and situations, effects of these magnitudes in these domains would be obviously evident and destructive in the world.
Civilizations would collapse.
Worlds would break.
We would damn-for-sure well notice.
THE WORLDBREAKER HEURISTIC
Enter The Worldbreaker Heuristic: when you encounter an effect size estimate in the literature, imagine it playing out in the streets. Does the world break? Does society implode? If so, dig deeper.
Case in point, Joe Hilgard spotted a Really Really Really Big Effect of a few minute’s video game play on being nice. Like too good to be true big. So he followed up on it! Science in action! LINK, Y’ALL
Also relevant: Richard Morey suggested we abandon the point effect size estimate and instead report some sort of lower bound. Lots of times the seemingly gigantor point estimate is accompanied by plausible effect sizes nigh indistinguishable from homeopathy.
Caveats and crap like that…
Okay, time to anticipate a potential objection…
We crafty experimentalists are able to surgically isolate effects in-lab. Naturally effects will be noisier outside these sanitary confines.
Okay, fine. So, what do you think the effect is in the real world? Would it be noticeable? Is it likely to be important? On some level, if our lab tricks don’t transcend their confines, then what are we doing…
Also, to me this suggests (as with lots of other stuff) a need to really focus on estimation. Knowing that p < .05 tells us virtually nothing (except for the long-run performance of a given test). It doesn’t tell us what to expect from the world. It doesn’t tell us how to arrange our grocery stores to avoid needless bloodshed. I don’t find it especially useful. But smarter have argued otherwise, so comme ci comme ca.
Finally, I think viewing our research in this way suggests that we perhaps should try to design studies so that we are estimating natural unstandardized effect sizes. If you’re anything like me, you’re better at thinking about lives, dollars, cupcakes, tears, or jobs than you are about standard deviations. So why not design studies that naturally lend themselves to real world applications (cynic response: but if I do that then it will be easy for everyone to see that my effect size is ludicrously large, given what we know about the world).