In this post, I'll do my first ever Post Publication Peer Review of a study. PPPR recognizes that successfully navigating the obstacle course of peer review does not necessarily guarantee that a given study is True and Right and Useful and Good. Instead, it's up to the scientific community (beyond an Omniscient editor and 2-4 Omnibenevolent reviewers) who get to chime in on study quality after a paper's out.

To set the stage, there's an older (by my short timeline) study. It got some press (some okay, some crappy). Just recently an independent team published a replication attempt of this study. I'll reveal the results later. I'm not PPPRing the replication. Instead, I'd like to PPPR the initial study.

The initial study

The initial effort reports a small study (total N = 57 for a 2 group design). It uses a new and seemingly face-valid visual prime as a manipulation, and single item face-valid DV. It reports a statistically significant effect of manipulation on DV, t(55) = 2.24, p = .03. Pretty par for the course for a study that was probably conducted sometime around maybe 2009 or 2010.


  • The sample size is (by 2017 standards) embarrassingly small. 31 participants in one condition, 26 in the other. Between subjects design. Barely over the False Positive Psychology paper's recommendation of 25 [oops I meant 20] per (yeah, about that...). Oof.
  • Although the IV and DV both appear face-valid, they also both seem to be new for this study. Where did they come from?
  • A single item DV for a complex construct? Okay then.
  • The manipulation looks fairly overt. Maybe it works? Although if it does, there might be potential demand characteristics?!? Tough to tell, as the original paper doesn't really say much about where this manipulation came from.

So, all in all, this looks like a cute effect emerging from a small-sample experiment using novel manipulations on a presumably noisy single-item DV. This is a recipe for trouble. Given what we (please say "all" please say "all") know now about recent replications and what spells "robust finding" and what spells "spurious finding," we should probably bet against this study replicating. It wouldn't be the first small, cute, barely significant experiment to not replicate. It certainly won't be the last. To reiterate: this looks like a perfect example of an experiment we shouldn't expect to replicate especially well.

The replication

The replication paper includes replications from 4 samples. Each sample is more than twice the size of the initial study. Pooled sample size is about 950. The authors were in close contact with the initial team when designing the study, so it looks high fidelity. Replication also included a positive control, just to make sure the participants weren't totally out to lunch.

And the replication shows no effect, d = .07 [-.12, .25]. No big surprise there.

I suppose a lazy (or hyper-motivated?) initial author could cry "hidden moderator" or something along those lines. But there appears to be zero case for such a claim here. That would be a weak argument. This looks like a classic case of an old, small, cute, barely significant study (predictably) not successfully replicating when subjected to a more stringent test. Par for the course these days.


In a way, this post is a whole lot of "business as usual." Another weak initial study failed to replicate.

Important point: In this case I blogged about it because I am the first author of the initial study.

Here is the valuable replication study, led by Bob Calin-Jageman (Full team: Clinton Sanchez, Brian Sundermeier, Kenneth Gray, & Robert J. Calin-Jageman): LINK

I congratulate them on their fine work!

Oh yeah, my grad students tell me it would be a good idea to mention what the study is about. It's about whether looking at The Thinker leads people to report less belief in God. It's part of a complex story about analytic thinking and religion, a topic on which I have many thoughts. Maybe another day...

PS...I usually include an f-bomb or two in my blogs. I saved them for the postscript this time.



There, that feels better. I wanted to use this postscript for some context setting, and to muse about this whole experience.


By my timeline, the initial study was conducted sometime 2009-2010ish. Science ended up being the third journal we tried (I know, right?). I think it was under review there for about a year? Could be wrong.

For the replication, Bob Calin-Jageman and his team first got in touch with me (according to my email) in May 2014. So 2.5 years from first contact to publication. Anyone who claims that replicators are in it for the short easy process or the fame are crazy. All credit to Bob and his team: they kept me in the loop throughout. I did my best to help them recreate the study I ran (even though I wasn't super duper optimistic). They were beyond cordial and professional throughout. On my end, and I hope on theirs, everything was positive.

I did get to review the manuscript, after it had already passed an initial round of review at PLoS. I recommenced acceptance, and I think my feedback was pretty minimal (word choice here, different stats presentation there). Typical trivial Reviewer 3 BS.


Facts aside: Is it fun to find out that a study you published in a high profile outlet back in the day does not hold up well to more rigorous scrutiny? Oh hell no. I highly recommend you avoid the experience.

How do you avoid the experience? Make sure you're more rigorous up front. More power. Open science, etc. Follow Mickey's advice and CHECK YOURSELF BEFORE YOU WRECK YOURSELF while being open to REVISING YOUR BELIEFS. Embrace methodological reform. It isn't going away. We can look at old papers and say "sure, but we didn't know better then." Even the famous False Positive Psychology paper is already dated (25 [oops: 20] per cell, ha!). Maybe we will be able to pick a year and say "LOOK everyone, we get it, before YEAR XXXX we didn't always know better. But after YEAR XXXX nobody has an excuse for not changing yet. Nobody can claim ignorance on these issues after YEAR XXXX." I have no idea which year counts as that YEAR XXXX. But I am quite confident that it is already in the past rather than still in the future.

I think my Methodological Awakening (tm) started in ~2012 and continues to this day. I'm optimistic about the direction our field is headed (though occasionally dismayed to see studies at least as weak as mine published in 2016 in places like Psychological Science, JPSP, PNAS, and everywhere. Progress ain't linear). And I'm super proud of the steps I've taken to both 1) educate grad students about methods (my favorite class to teach), and 2) upgrade my own science. My best 2-3 papers HANDS DOWN are currently under review. I hope I can say the same thing again next year (assuming, of course, that I'm not talking about the same papers still being under review).

Science is dead. Long live science.

AuthorWill Gervais