Podcasting legend Yoel Inbar (from Two Psychologists Four Beers) joins us to break down Tal Yarkoni's "The Generalizability Crisis," the paper that launched a thousand Twitter wars. Psychologists make verbal claims about the world, then conduct studies to test these claims - but are the studies actually providing evidence for those claims? Do psychological experiments generalize beyond the the strict confinments of the lab? Are psychologists even using the right statistical models to be able to claim that they do? Does this debate boil down to fundamental differences in the philosophy of science - induction, Popper, and hypothetico-deductive models and so forth? Will David and Tamler ever be able to talk about a psych study again without getting into a fight?
Plus ahead of tonight's New Hampshire primary, expert political analysis about what went down in Iowa.
Special Guest: Yoel Inbar.
Sponsored By:
- BetterHelp: You deserve to be happy. BetterHelp online counseling is there for you. Connect with your professional counselor in a safe and private online environment. Our listeners get 10% off the first month by visiting Betterhelp.com/vbw. Promo Code: VBW
- Prolific: Prolific is giving away $50 to VBW listeners who want to give online sampling a go! Whether you're a social scientist doing research, part of a marketing group, or even a high school student interested in doing a social science project, prolific can offer you fast, reliable, quality data to answer your research questions. Promo Code: verybadwizards
- GiveWell: Givewell searches for the charities that save or improve lives the most per dollar. Consider a donation this holiday season--your dollar goes a lot further than you might think! Promo Code: verybadwizards
Links:
- Yoel Inbar
- Two Psychologists Four Beers (Podcast)
- The app that broke the Iowa Caucuses was sent out through beta testing platforms - The Verge
- Yarkoni, T. (2019). The generalizability crisis.
- The 20% Statistician: Review of "The Generalizability Crisis" by Tal Yarkoni [Daniël Lakens' Blog]
- Inbar, Y., Pizarro, D. A., Gilovich, T., & Ariely, D. (2013). Moral masochism: On the connection between guilt and self-punishment. Emotion, 13(1), 14.
- Mook, D. G. (1983). In defense of external invalidity. American psychologist, 38(4), 379.
[00:00:00] Very Bad Wizards is a podcast with a philosopher, my dad and psychologist Dave Pizarro having an informal discussion about issues in science and ethics. Please note that the discussion contains bad words that I'm not allowed to say and knowing my dad some very inappropriate jokes.
[00:00:17] Jesus Christ who gives a shit ok? Is this selfish? Is it altruistic? Ok we'll fucking figure that out in the old folks own later. Right now we have a chance to attach ourselves to a fucking moral cause ok? We have to do this.
[00:00:28] The Queen in Oz has been spoken! Pay no attention to that man behind the curtain! I'm a very good man. Brains in U.S. Anybody can have a brain? I'm a very good man. Just a very bad wizard.
[00:01:18] Welcome to Very Bad Wizards, I'm Tamler Sommers from the University of Houston. Dave, despite the efforts of all you Democratic National Committee liberal elites, Bernie Sanders won the Iowa caucuses and is now the favorite to win the whole goddamn thing. Are you ready for the first Jewish president?
[00:01:36] Wait, Bernie Sanders is a Jew? It's surprising yeah. I'm not even joking. Oh god damn it no wonder he's so good at fundraising. He's like everybody these are friend prices a dollar each I don't care.
[00:01:52] You're not going to be able to get away with that in a few months so enjoy it now. We're going to crack down. We are definitely going to be cracking down on this shit.
[00:02:02] Well as long as we don't elect the gay fellow, it might come down to a Jew or a gay man and in all cases I think that it's shameful that we haven't elected a woman first. Latin America has been electing women for years.
[00:02:21] The most stereotypically misogynistic people on the face of this planet are okay with women presidents and somehow we're not as a nation. It's very disturbing as a man, as a father of daughters. When they put up a good candidate, we'll be happy to elect. Elizabeth Warren.
[00:02:44] My name is Dave Tsar from Cornell University and Joel M. Barr from University of Toronto. You are with us. Hey guys. Welcome back. Well thanks for having me it's great to be back. Are you excited for Jewish president? Yeah, I filled in my California ballot.
[00:02:58] I just have to fax it in. I was flirting with Andrew Yang for a while but I think it's going to be Bernie. It's got to be Bernie. So does Tam learned until he found out he was brown.
[00:03:09] Did you take that Washington post quiz where it asks you about the issues and then it matches you with the candidate who best matches your position. Yeah, and I got Yang so I was thinking about it. I got Yang too but. Oh whoa.
[00:03:21] So you guys aren't driven by reason here? You're not driven by the actual. No because that's a bullshit measure. But also you can't quantify your position. So I'm a full on Bernie bro now.
[00:03:35] I don't know how it happened exactly but let in the last few weeks I became like a chapeau trap house listening spend my days trolling Warren and Buddha judge supporters on Twitter Bernie bro. I love it. That's amazing. I'm on board for the revolution.
[00:03:51] It's amazing you're as communist as I mean it's very weird given your elite rich background that you would. It's not it but you have an elite. So what are we going to talk about with the oil?
[00:04:07] Yeah, so we should we talk about the Iowa caucuses where Pete good bootages tried to cheat his way to a victory. Yeah, but at least let's tell everybody that we're going to talk about in the second segment a deep, deep issue in the philosophy of science is psychology
[00:04:22] of science. That's what we're going to tackle in the second segment right? Yeah, definitively. If definitively I hope to get some definite answers by the end of the second segment and the first it's a paper by Tali Arconi that we have
[00:04:38] talked about or we have I've alluded to because I had read it in a previous episode Dave that we did on the trash talking studies and you guys did a whole episode on it. Yo, which I really enjoyed. Oh, thank you. That's very nice of you.
[00:04:54] We'll put a link to two psychologists for beers episode on this. And in honor of your oh, you're drinking. I'm not drinking for once. Wow, this feels weird actually. I'm having a beer in honor of you guys. So yeah, normally I'm usually having something stronger at this point.
[00:05:12] So put up with Dave, but at least it's a double IPA. Nice. Yeah, to really be consistent with our theme, you have to go on about it for a while. Yeah. But you also can't be informed. You just like ramble uninformedly for like 10 minutes about the beer.
[00:05:26] Let's talk about the Iowa caucuses. So I feel like I'm informed. I want to know what from people who aren't as politically active as I am. What like what is your impression of what happened? Yeah, David, you first.
[00:05:40] Well, I mean, this is the reason I'm even willing to talk about this is because this to me largely isn't a political topic. This is a technological failure on an embarrassing scale that that maybe says something about the how in how bad Democrats are at getting
[00:06:00] shit done. And it's no nobody in their right mind would deploy a piece of software on a large scale that had been so non tested. And these are the people you want to run your health care as Trump said, as Trump said. Yes.
[00:06:19] Yeah. So the exactly the app that they use that was developed by a company called Shadow that is operated by a company called Acronym. It couldn't be if you're a conspiracy theorist, like you don't even have to try with this thing.
[00:06:36] And not only that, but Acronym had like Buddha judge gave like them one hundred and thirty five thousand dollars. I mean, the whole thing is shit. I didn't know that. Terrible. That's terrible. Yeah. So what are your thoughts? You know,
[00:06:53] so what I know is that they fucked up the reporting such that on Tuesday night, nobody knew anything. And then Buddha judge sort of declared victory without everything being in and now sort of Bernie sort of catching up and it seems like neck and neck.
[00:07:11] And that's basically all I know. Oh, and that they're going to do like a full recount or something. Well, now the head of the DNC I think it's his name is Tom Perez called for like a recount. But that's just because his candidate, Biden, just got crushed.
[00:07:28] And so I don't know if that'll actually happen. But the thing is the funny thing about the app is while I could be I could go conspiracy, I guess. Incompetence seems like an equally, if not more plausible theory because or explanation.
[00:07:49] Yeah, it seems like incompetence is by far the most likely explanation. Yeah. What would the conspiracy be even? Like how would this work? Well, so the conspiracy, it starts when there was this big poll, the Des Moines Register poll that
[00:08:06] is usually the thing that the media focuses on right before the caucuses that will motivate people to vote. Like if the poll shows your candidate kind of out of it, you might be motivated to stay home. And also like it's a big deal going to the caucuses.
[00:08:23] It's not like you go in there, sneak in, vote, and you're there for like a couple of hours or like an hour because of the whole way the system is run. And so like it's a big deal what that poll is. It's very influential.
[00:08:37] And the Buttigieg campaign just made a complaint so that they wouldn't release the poll. So that already got the online Bernie Bros. Like me, not really, wondering what was going to happen. And then you find out that they're using this app
[00:08:58] that Buttigieg gave a lot of money to the company that is run by a former like a bunch of former Hillary campaign operatives. You know, you could see the conspiracy where the conspiracy would come out of that. Hillary hates Bernie, the whole establishment.
[00:09:13] But wouldn't the conspiracy at least require that the app reliably report fake results? Like, like in what world is it good to have people doubt the results due to the software failure? We can't with caucuses. It's all takes place in public. Like there's no secret ballot.
[00:09:30] You have to like stand where with the person that you're voting that you're voting for under some big sign. And so there's no way to really cheat using an app. So I guess the conspiracy, I agree, it doesn't totally make sense.
[00:09:47] The conspiracy would be that they just wanted to fuck up the whole process because they knew that Bernie was going to win and that Biden was going to lose. And Buttigieg, which was his only chance at making anything, probably wasn't going to win either.
[00:10:00] So just skunk the whole process, make it kind of unreportable so that Bernie doesn't get like a wave of momentum going into New Hampshire and the rest of the primaries. That's the idea. You know, this is actually I'm actually worried.
[00:10:16] I've heard a few people discuss this on tech podcasts. And I think the biggest fear that I have is that it when something like this happens, it undermines people's faith in any of the electoral process, like in any of the voting systems.
[00:10:31] People are just are now afraid that that their vote won't actually count, that that the results are going to be skewed. And in a nation where we've already we're already having fears about meddling in elections, having a populace of people who does not fundamentally trust the mechanism
[00:10:51] by which we're electing our leaders is like one of the worst. And you know, it's it's not good for democracy. That's very earnest, Dave. It's like as a as a deeply political person, I feel this at the core of my being.
[00:11:05] You know, I mean, I don't give a fuck to be honest, but. You weren't going to vote anyway. So no, it's actually not for Bernie. You all go ahead. Which of you is the Klobuchar fanboy? I know it's one of you, but I can't remember which.
[00:11:21] That's my brother. Oh, OK. He just thinks it's hot that he she throws staplers at staff. He's he's really into abusing subordinates. Yeah, I get it. So one upside might be people start asking questions about why Iowa gets to go first. Yeah, I think that that's over.
[00:11:42] I think this is done. Really? Wow. OK. That's the sense that I'm getting from people is because everyone was already asking those questions and now they have the perfect excuse to have a new system
[00:11:55] where different states go first or a different state goes first and it's not a caucus. But yeah, to have a state it's like 91 percent white, which is kind of amazing to have a 91 percent white state like in today's America.
[00:12:13] You mean you mean amazing as in great like is that what you're saying? Packing up the family moving there. No, I mean, I live in the exact opposite kind of city. But like, yeah, I think that might be over.
[00:12:27] Also the caucuses, do you guys know how caucuses work? Yeah, I do. I have little to no idea how any of this works. So you go there and you vote for a candidate at first, right?
[00:12:40] And then in your precinct, the candidate that you vote for has to have at least 15 percent support or else they are eliminated. So then you you everyone does their first vote and all this, including the first vote, I believe goes on in public.
[00:13:01] And then and then afterwards, every candidate who didn't get 15 percent of the vote is eliminated and the supporters now can choose a different candidate who they should support. So like your second choice actually matters in the way that it doesn't in primaries.
[00:13:21] And so once you're so let's say I was a Bernie supporter, but I'm in a precinct where Bernie didn't get 15 percent. Now I can join. I could be a Yang gang guy, except that he would have probably also been eliminated.
[00:13:35] So I would have had to choose between like Biden, Buttigieg and Warren or something like that or Klobuchar. So so that's how it works. It's kind of interesting in two ways. One is the way that your second choice actually matters.
[00:13:51] And the other in a way that it's it goes on in public. And you know, I have a lot of small towns, you know, like people will know who you voted for. That's some people complain that it's undemocratic.
[00:14:05] Yeah. Isn't it also the case that you can show up undecided and just be like pitch me like whoever gets it right, whoever makes the best argument, whoever promises to buy me a cheeseburger tomorrow, gets my vote? Exactly. Yeah, totally.
[00:14:17] It's like there's something about it that I kind of like, but I get all the objections to it. So it's like a version of ranked choice voting. Like why don't they just have people rank their candidates? Yeah, no, that's exactly what it is.
[00:14:30] No, like why not have a one step process? Yeah, why not just do that? Yeah. Why have this arcane thing where you have to show up and stand around all night? Yeah. And first of all, you have to be able to make it at that time.
[00:14:42] And yeah, I agree. I can I see the appeal of it though. It's like it brings the community together in a time when that hardly ever happens and you're actually participating in an active way and likes your civic life. So I see the appeal of it
[00:15:04] if it weren't for the objections, I would. Yeah, I mean, I feel the same way about like finding this somehow appealing. It's just it has such an outsize influence on how the rest of the primary goes. Like I forget the exact number.
[00:15:17] Nate Silver said on Twitter that their model, which is like empirically based on past elections, waits the Iowa caucus results at like some crazy multiplier of any other caucus or primary, right? It's like 10 X or something like that. And that just seems insane. I agree. Yeah.
[00:15:34] And it's also one where you would want, I think like the primary would if you do have to have a first one that will have that influence, it shouldn't be a caucus. One thing though that that seems clear to me
[00:15:47] to take away from the pros and cons of caucuses is that I don't think that there will. I don't think we should ever move to any electronic system of voting at least not without always having a paper trail.
[00:16:02] And this is one case in which like technologists are surprisingly in agreement where they're like, you know, like crazy Silicon Valley people are way more like they're totally willing to say, oh, like airplanes should fly themselves and cars should fly themselves.
[00:16:14] And you're dumb not to rely on algorithmic solutions to your problems. But everybody is like, no fucking way. We should never have an election using electronic systems. Like that is just too prone to failure. Well, the caucuses are analog entirely,
[00:16:30] like in terms of how the votes are collected. It's just the app was just for reporting it to the National Committee. So they actually had a paper. They had paper. Thank God, right? Yeah. Thank God for Bernie and the socialist revolution that's coming for you and your ilk.
[00:16:50] You just like it because he's like said that he was going to legalize federally legalized marijuana on his first thing. I like that. I'm not going to lie. Yeah. Yeah, it's you know, we've had that in Canada for a while. It's been great. You should look into it.
[00:17:05] Well, you guys don't run the world. We don't want people running the world high on marijuana cigarettes. We did decriminalize it here in Houston. So we there is that. But I love that you guys can just go and buy like you could just go.
[00:17:17] I feel like some edibles and just go and just get a bunch of this. That's that's awesome. I'm jealous of that. You don't even have to go to the store. You could just go to the website and they ship them to you. They're there in like two days.
[00:17:28] It's great. I was all for that until I went to Vancouver with Tamler and he just just because he could she's a confidant company. The meetup was just complete shambles. Yeah, now that I understand edibles more having, you know, like been to a bunch of these states,
[00:17:49] like the amount I took that night is insane. And then to just do more with those UBC students. God damn it. Well, never less than learned. I think everybody's made that mistake once and some people more than once. I put myself in that category.
[00:18:09] That's definitely not the first time that's happened. All right. Should we take a break? Do we have anything more? Who are you voting for? Dave? I don't know. And I genuinely don't know. I honestly, I like Elizabeth Warren a lot. I went through that phase. I'm over it.
[00:18:25] I donated to our campaign and everything. Me too. And I went to a town hall of hers. Oh, wow. Yeah, I heard she's she seems very impressive in person. It's weird. She just all of a sudden the country just decided
[00:18:38] the country was excited about her and then just decided, yeah, you know what? Not I've never mind. And I'm kind of in that group. So fickle. Yeah. Sexism. It's obvious. We should we need to look for countries like Chile and Argentina to get over our sexism.
[00:18:56] She what? I don't think it was sexism. I just think she came off as a little shrill. Well, all right. Any final thoughts? Yoel, nope. I think we covered it. All right. When we come back, we'll talk about whether you guys should try to find different jobs.
[00:19:16] All right, let's take a moment to thank one of our sponsors. Better help. Is there something that interferes with your happiness or is preventing you from achieving your goals? Don't answer that question, Dave, because I think you're just I'm I'm that something.
[00:19:38] Well, then better help online counseling is there for you. Connect with your professional counselor in a safe and private online environment. It's so convenient. Now you can get help on your own time and at your own pace.
[00:19:54] You can secure video or phone sessions, plus chat and text with your therapist, licensed professional counselors who are specialized in depression, stress, anxiety, relationships, sleeping, trauma, anger, family conflicts, grief, self-esteem and of course anything that you share is confidential.
[00:20:19] If you're not happy with your counselor for any reason, you can request a new one at any time and there is no additional charge. They have 3000 US licensed therapists across all 50 states for communication modes, text, chat, phone and video available on desktop, mobile, web, Android and iOS apps.
[00:20:43] You can schedule video and phone sex sessions. Generally. You set phone sex. You can schedule video and phone sex. I don't know about that. That's between you and your therapist. You could schedule, schedule video and phone sessions. Generally, weekly unless your therapist schedules more.
[00:21:13] There is broad expertise in the network, which may not be available locally. And there is financial aid for those who qualify. One thing we want to note and emphasize this is not a crisis line. No, in fact, you know, you can get therapeutic help in many,
[00:21:33] many ways like listening to this episode might be therapeutic for you. But we are also not actually realized at least I'm not them. That's right. It's a shot and fried. But but no, this is if you need a crisis line,
[00:21:46] please look up your local, you know, your local and national crisis lines for all kinds of stuff. This is this is therapy, though. This is better therapy than than a lot of people have had access to traditionally. And it's a truly affordable option.
[00:21:59] So even more affordable for listeners who are listening right now. Very bad wizards. Listeners have gotten a special deal from Better Help. You get 10 percent off of your first month with the discount code VBW. So get started today. Get some therapy. Get that mental health and hygiene going.
[00:22:18] Go to betterhelp.com slash VBW. You can fill out a questionnaire, help them assess your needs and get matched right away with a counselor that you'll love. That's betterhelp.com slash VBW. Our thanks to Better Help for sponsoring this episode of Very Bad Wizards.
[00:23:13] You check out. Now listen to it. You wet when you check out the technique from the two texts. Now listen to it. You wet when you check out. Gun to you. Now listen to it. You wet when you check out the technique from the two texts.
[00:23:40] And I don't need to lift. Welcome back to Very Bad Wizards. This is the predictable time of the show where we like to thank all of our listeners, all of the people who get involved in discussions
[00:23:52] with us, who reach out to us to tell us what they thought about an episode, who just contact us in various ways, take part in discussions. We really appreciate it all. We we, you know, like this last this last episode that we did was about David Foster Wallace.
[00:24:10] It was a topic that was solely driven by our listeners. And you guys came through. You guys have have been talking to us about it. And we really appreciate that at every level. We like the discussion.
[00:24:22] So if you do want to get ahold of us, you want to reach us, you can email us at verybadwizardsatgmail.com or you can tweet to us if you have something that only requires 280 characters and you're
[00:24:35] sweetened to the point unlike unlike what this segment is going to be. Yes, you can tweet to us at verybadwizards or at Tamler and at P's. You can rate us on iTunes. This is something that we sometimes forget to emphasize, but that is,
[00:24:51] I think one of the best ways that people have a finding us if they don't know about us is that the more people that that rate us on I think it's called Apple podcasts now. Oh, yeah, it is Apple podcast. Wow. Yeah.
[00:25:05] You you've come through on some Apple knowledge that you and I tables a turn. Now I'm going to like drop some recommendation for French cinema. This is like some bizarre quantum world right now. So right is on iTunes.
[00:25:21] Get involved in the discussion on our subreddit, reddit.com slash R slash very bad wizards. We have a lively discussion there. Our Facebook page has been graciously saved by one of our listeners. Right? Yep. David Lara, he is now running the Facebook page.
[00:25:40] When I follow us on Instagram and when I post on Instagram or the rare occasions now that my daughter does, will it will post directly to Facebook because they're linked up. But he is also posting a bunch of other stuff, including links to the episodes and for example,
[00:26:01] links to this paper. So sometime last week, he posted the paper that we're about to discuss in the next segment. And yeah, we really appreciate it. David Lara, thank you very much. Yeah, thank you so much. For bringing it back.
[00:26:15] And a lot of people seem to like it. It got a ton of likes that his first post that he was bringing it back. So I'm really happy about that. Also, I'm going to ask him to do my Facebook page.
[00:26:28] You can, Tamela mentioned you can follow us on Instagram as well. So so all the ways that you have to to engage in discussion with us to reach out to us to engage in discussion with our fellow with your fellow listeners, we really appreciate it.
[00:26:44] Thank you all very much. Yeah, and if you'd like to support us in more tangible ways, there are several, several ways you can do that. And one of them is to give us a one time donation on PayPal. We appreciate all of those.
[00:26:59] Another is to become one of our beloved Patreon supporters. Our supporters mean so much to us. They give us ideas for topics. They even get to vote on a topic a couple times a year. We're probably coming up in about a month or two
[00:27:15] to the next listener selected episode. And that usually results not just in one episode, but in several. The David Foster Wallace one came in second place last time. So so yes, become one of our Patreon supporters. There are you can get all of Dave's beats
[00:27:35] with a $1 donation as well as ad free episodes with a $1 per episode donation at two dollars and up. You can get bonus episodes. You know what we should do next? We've been promising it top five deadwood characters. Oh, yeah. Yeah. That'll be fun. We'll do that.
[00:27:52] Deadwood is one of those things that I also just find it's so easy to rewatch. Like it's there's no barrier for me to just start up like the first episode. It's great. I know I was already there. I had that thought we promised we got to do it.
[00:28:09] And I was already trying to figure out who my top five was. It's tough. There's so many awesome characters. And Yol who is who's coming up on the second segment. He is probably going to join us for a series of dark episodes.
[00:28:27] The Netflix series bonus episodes for season three. Yeah, first season three when that comes out sometime in June. So some good bonus stuff coming up and for five dollars and up, you will be one of those listeners who can vote on what topic we choose
[00:28:45] a couple of times a year. We appreciate all of you. Thank you so much for all the different ways you support us. And now let's get back to the episode. All right, so let's talk about our main segment.
[00:28:57] But before I get into the details of this paper called The Generalizability Crisis by Tal Yarkoni, this is really a broader topic, which is fundamentally about whether or not what psychologists do is actually giving us any knowledge about the human mind.
[00:29:16] And one of the questions that you can ask about this is do the experiments that psychologists do in, you know, as we as we with labs conduct these these experiments testing hypotheses where we have stripped down procedures and we have quantitative methods and we do statistics.
[00:29:39] What does that say about human beings in general? How much can we generalize from what we do there? And that's the question that Yarkoni starts with. And I'll just give a broad overview and, you know, you can jump in because
[00:29:52] you're actually more of a of a nerd about some of the statistical stuff, especially. But Yarkoni, who is from UT Austin, has this paper, this new paper where he's essentially arguing that, look, what we do as psychologists when we try to collect data
[00:30:11] about about human beings is we make these verbal claims. We say, oh, I think that people who are easily disgusted are more likely to be conservative to use an example close to home. But then what we have to do is convert that verbal claim,
[00:30:28] that broad verbal claim into an actual testable specific hypothesis that is that we can evaluate by operationalizing that idea. So by actually taking that idea and converting it to something that we can measure using whatever manipulations in the lab and then measurements in the lab.
[00:30:51] And then we run statistics on it. And he thinks that we're making a problematic leap when we go from verbal claim about how the mind works to specific, quantifiable, quantitative, experimental claim and then going back to extrapolating about how human beings in general work.
[00:31:10] He thinks that this is that we are not learning much, if anything at all, about how human beings work. What we're learning is something that Tamler, you and I were talking about in our last episode. Maybe we're just learning what Cornell undergraduates
[00:31:26] in the year 2020 who take this particular test under these conditions, what they are doing, not about, in general, what the human mind is like. So he says we should actually be very wary about using psychological experiments to generalize about the about all human beings. Is that a fair?
[00:31:48] Yeah, yeah, yeah. He has a night right at the beginning of the paper. He says if you run an experiment, technically, you might draw the limited conclusion that priming undergraduate Plymouth students with 40 cleanliness related words increases 21 point moral discussed ratings for six specific moral dilemmas.
[00:32:10] And but really what the claims that are extrapolated out of that and I'm guessing people are more disgusted. No, it's that cleanliness reduces the severity ratings, like how bad you think something is. Is that your study? No, but it is a real study.
[00:32:26] It just to be specific about what his claim is here, like this gets a tiny bit technical. But basically when you run these statistical models, you have to say which of the things that you're observing,
[00:32:38] you think are samples from a larger population that you want to generalize to. So participants, for example, we don't care specifically about the 200 people who happen to come do our study. We want to generalize to other people like them.
[00:32:53] And assuming that we get multiple responses from one participant, so those responses would be correlated for each other. We have to tell the statistical model that these come from the same person and that allows it to appropriately adjust our confidence about
[00:33:07] whether the results that we observe in this sample are going to generalize to a new sample. And the argument that he makes is in the same way that we want to generalize from the specific participants that we ran to a broader group that they're theoretically sampled from.
[00:33:21] In the same way, we want to say our results here don't depend on these specific stimuli, right? They don't just depend on these specific cleanliness words. They don't depend on these specific moral judgment vignettes.
[00:33:33] We want to make a broader claim that says when people are in this mental state, this thing happens to affect their judgments, right? And so that's inherently, he says, a claim about not just those specific things that we use.
[00:33:46] And the argument that he makes is that that's rarely ever modeled that psychologists typically don't run their experiments in a way that allows you to model that variability and that if you do make some kind of plausible assumptions
[00:34:01] about what kind of variability might be associated with those factors, that it means that we should be much less confident about the results of our analyses, right? Your uncertainty goes way up. So the analysis that was significant P less than point five
[00:34:16] now isn't because we're now incorporating the uncertainty around these other things that we want to generalize to. To get to specific examples, we use like suppose that we're interested in evaluating a hypothesis about the what influences moral judgment.
[00:34:30] And we decide that we want to test people's moral severity ratings, like OK, are people harsher in their blame? And so I, Joel and I come up with what we have done. We come up with a set of questions, a set of vignettes
[00:34:44] and a set of questions like Tamler steals weed from his wife, right? And she doesn't know about it. And how much blame do you give Tamler? How wrong is what Tamler did? And so we come up with three or four scenarios in a typical study,
[00:35:01] and we come up with three or four questions for each of those scenarios. And it's we are never under the we're never doing this because we think that all we want to know is how people respond to those three scenarios.
[00:35:15] We want to sample of all of the moral judgments that you might make. What can give me a microcosm of moral judgment? What can I say is a good set of questions that would assess moral judgment in general? And so we pick those and we use those three.
[00:35:33] Yarkoni wants to say, well, we're doing our stats wrong because we have to accept that this is just one random sample of all moral evaluation questions. If we treat the statistics properly, what we'll find is that we have radically under overestimated how powerful some of these phenomena are.
[00:35:53] And that the the answers to the questions, those could be different if you just used a different scenario. But that's not built into the model. The model is assuming that those questions can stand in for moral judgments more generally. Is that right?
[00:36:13] Yeah, that's I mean, technically what he's what do you think is that they were misdescribing what the model is. So if you don't model the stimuli as a random factor, you're only entitled to say things about those stimuli.
[00:36:27] But we in our verbal descriptions say things that are more general. So it's like there's nothing wrong with a model per se, right? It's the link between what we want to say and what we've modeled. So you pretend so this is where he talks about the fixed
[00:36:41] the fixed effect fallacy, where you you actually do model it correctly. But if you go according to that model, you wouldn't be allowed to make the generalizable claims that psychologists make all the time. And if you did model it as a random factor,
[00:37:00] then you wouldn't get the effects or you wouldn't get anything close to the effects. And so there's really nothing. There's no way out of this, according to Yarkoni, as I understand it. So if you did model it stimuli as a random factor and all the other
[00:37:15] variable thing factors, if you model them as random factors, you don't get the results that you need. I mean, that depends. It depends on how much your effect varies across stimuli. So it's certainly possible that that variance is pretty small. And in that case, it will matter less.
[00:37:33] And specifically what you're talking about changing is not the point estimate of the effect, but the standard error. So your uncertainty around the effect and what he presents in the paper is kind of plausible ranges for what that what those variants components might look like.
[00:37:50] Because often those experiments aren't run in a way that allows you to even determine what they are empirically because the thing doesn't vary. So for example, if I only give you a single moral judgment vignette, then I can't tell how much variability
[00:38:05] there is between vignettes because there's zero because you only did the one. Right. And his point is that there's lots of things that could vary, that could be important. Most of those things aren't actually varied as part of the design.
[00:38:16] So the best you can do is kind of make some assumptions about how how much variance might be associated with those. And then you plug those in and you say, like, assuming that the variance across units on this dimension is X, how does that affect our certainty
[00:38:31] about the estimates that we're getting? And what he shows in the simulations in these analyses, rather, is that once you add like even like a moderate amount of variability due to this stuff, your estimates of the effect get much less certain to the point
[00:38:48] where they're like can be nearly uninformative, right? Where the model says like, well, it could be anything from negative infinity to positive infinity. I don't really know. That's the dilemma. It's that if you did model it in a way from not for every case,
[00:39:04] but for most cases, if you did model it in a way that would license the more generalizable claims, then you the uncertainty would be too high. Right. But I don't know that this certainly can't be an imprincipal argument. This is an empirical claim, right?
[00:39:20] I mean, this is right. And it's not as if psychologists don't use random effects models when when doing their studies, which we usually subjects as random effects. Like we are aware that that individual participants differ from each other. And that is a source of variability.
[00:39:40] And so we we model those as as random effects, not always, but at times. And you know, you get what you get, you learn what you learn. You get the effect size, you get the variance estimate.
[00:39:50] Yarkoni's claim is that we just have we as a field should be treating stimuli in the same way that we treat individual human beings. And he thinks that would actually lower like that would actually get us away from being able to make the conclusions
[00:40:06] that we make. And it's not just stimuli. There are other factors and opportunities for noise that also aren't modeled. Additional random factors that you could add to your model, but then that would just increase the uncertainty.
[00:40:21] I guess essentially what he's saying is that like our results are contingent on a specific set of things that we chose and we can't really know in advance which of those things are important or how contingent the results are.
[00:40:35] But we can make an educated guess that there's going to be a lot of variability that would undermine our confidence in this effect, generalizing across those different contingencies, which I think is like something that when you get graduate training in psychology, you sort of intuitively
[00:40:51] realize that right? You're like, well, you got to set up the experiment the right way or it's not going to work. That's what you learn as a graduate student is how to run the experiment the right way inherently like the implication there is like, well,
[00:41:02] there's lots of ways that you might think that you could run this experiment that aren't going to work. Do you disagree with just this first part of his argument, which is that the way these experiments are currently modeled doesn't license
[00:41:18] the kind of extrapolation that psychologists do all the time. So if you're asking me that like I want to unpack the two different claims. So like one, does it license general claims and two, are the general claims being made all the time by psychologists?
[00:41:33] So so one, I don't think that it licenses blanket general claims about human beings. But two, I don't think that psychologists do that. Right. He seems to think that this is how we think that we're doing our science.
[00:41:50] So what he thinks is a generalizability crisis really depends on whether or not we all assume that our experiments are giving us generalizable truths. And I think that that if you read most papers aside from the sort of
[00:42:09] titles that you talk about, that the cautions about in over interpreting how generalizable these results are are very, very clear. Right. We could do a better job of saying this, but there is hardly a results section, sorry, a discussion section at the end of a paper that doesn't
[00:42:27] have a very clear statement of the limitations of the conclusions that we can draw from this. So it's not even just about rhetoric. It's about what we think that these studies are actually trying to do. Just to press that a bit.
[00:42:40] So we've done two papers recently, one about the trash talking study. And then the other one was about the link between being a moral person and comedy, right? And how funny you are and that people who are more moral are less funny.
[00:42:59] And so and in the discussion sections, maybe they noted the limitations, the boundaries of what it is that their studies show. But they also have this section where they give advice for that was the whole Dave, the whole thing about use inoffensive puns at work. Right.
[00:43:22] So to kind of make up for the fact that you're too moral to actually be funny. And then in the trash talking, the the I the whole the whole idea was try that be be be aware of this when you're trash talking that you could
[00:43:36] actually be motivating your your opponents. And if you're if you're in management and you are your your workers are trash talking each other, you should know some of the effects of this. There is implicit in that this idea that this generalizes beyond
[00:43:54] the very specific study with the very specific measure measures that they did. Right. Yeah. I mean, it's not that I don't think that authors don't don't say these things often and not not even that they're that they're
[00:44:06] often wrong or or even fail to realize the limits of generalizability in in their own papers. But but I think that Yarkoni is being a bit unfair about how much we go about doing that one. And, you know, I actually don't remember the discussion sections of the trash
[00:44:26] talking paper, but I think that what what's being overestimated is the degree to which even when press these psychologists think that that's what they're doing, that they are actually making a claim about people in general across situations.
[00:44:40] And we can get into the specifics of what I what we think they're doing. And this is where where I think we can have a fruitful discussion about what it is that we're doing with these experiments, but I don't think
[00:44:53] I think that's a mistaken view of how psychologists think that we're we're making progress in science. What do you think about that, Yoha? I feel like psychologists actually do that a lot, at least by implication. So I think there's a lot of studies that you wouldn't run
[00:45:11] if you really believed that your results were specific to this configuration of manipulation and stimuli that you happen to choose. So I think the verbal overshadowing example is a great one where Yarkoni says just on first principles, it almost has to be true that sometimes
[00:45:32] you would get verbal overshadowing and sometimes you wouldn't because there's these different processes going on. Cognitively, they can compete with each other. Memory encoding is noisy. Sometimes this other process is going to interfere. Sometimes this other process is going to help.
[00:45:47] And so if you really took that point of view seriously, you would start out by saying I want to understand the circumstances under which verbal overshadowing happens and the circumstances under which it doesn't and the circumstances under which it actually has the opposite effect.
[00:46:01] So you describe and it actually makes your memory better. And that you would start out just running a study that uses one particular combination of these things and be like, look, we found the thing. And then that people would come and do this massive multi site replication
[00:46:17] just looking at that one instantiation. I feel like implicitly says we can learn a lot from that one instantiation. And I think Yarkoni's point, which I think is right, is that we really can't. Well, as an existence proof, we can, right? Well, well, existence proof may be.
[00:46:36] But but I think existence proof is is a is not even the fair way to describe what we ought to be doing, at least, which is. And here's where I would talk about Lakin's blog post, which I think is an
[00:46:50] excellent, an excellent reply and one which I agree with largely. Or he says there's no generalizability crisis. So verbal overshadowing is the example that is used in the Yarkoni paper. And that is when if you're asked to remember a visual scene
[00:47:05] and then you're asked to describe it verbally, somehow that that seems to interfere with your memory for that visual scene compared to when you don't verbally describe that visual scene. If you have a theory and this is this gets to the heart of what like
[00:47:22] how we should be testing theories. But if you have a theory that says that, well, memory processes should be improved in general by a greater elaboration of the of the facts of the thing that you're trying to remember.
[00:47:38] So you have a theory that's that's that says like the more that you think or talk about a thing that you're trying to remember, the more your memory is likely to improve. So you bring a bunch of a super constrained population of people and
[00:47:53] you know, you bring a bunch of sophomores into the lab, you give them this set of stimuli and to show that verbal overshadowing occurs, that in fact you have you have this situation in which the very thing that we thought was going
[00:48:06] to improve memory is is interfering with memory then becomes a very interesting finding because if your theory was that no, just the way memory works is is by greater information and greater elaboration of that information is going to improve memory.
[00:48:23] You've shown an interesting case in which it doesn't. Now, I don't think anybody who studies memory of verbal overshadowing would say that like a gajillion things aren't at work in any instance in what you're trying to remember something.
[00:48:36] But the fact that under these these situations, you can to falsify that claim is what an experiment is supposed to do. And if you think that your experiments are showing generalizable claims, then you've not been thinking about experiments. Right. This is similar to like the classic JDM stuff
[00:48:53] that kind of meant Rasky stuff where nobody would say that those scenarios, those decision making scenarios that they give people are typical of things they encounter in the world. Right. But there's a kind of a very strong normative standard to argue against or like theoretical position to argue
[00:49:09] against from economics where economic theory says people should do this and then they don't. And so therefore that's interesting. Right. I guess my question would be are there people prior to the verbal overshadowing work who were working in cognitive psychology and like in memory
[00:49:24] specifically, who would have taken that position? Or is it the case that it's just our lay intuition that, yeah, talking about it should make your memory better and that the reason that this study like how this impact is a contradicts the lay intuition, not what an expert
[00:49:39] would have told you. Well, the fact that it's a lay intuition, I mean, look at theoretical claim has to come from somewhere and there's no lack of theoretical claims that have not been tested yet. Right.
[00:49:49] And so it's going to come from some kind of intuition or at least you're using empirically derived findings and constrained conditions to try to figure out which of the general claims are right or wrong.
[00:50:01] And if you view it as Dan Lachens does view it, he says, if you view science as proceeding by deduction and not induction, you realize that this is not a problem at all like this. Well, it's a problem maybe in rhetoric in the way that we communicate our
[00:50:14] results, but not a problem about science. A couple of things. Number one, I don't think that responds to YOL's original point, which is that with this particular experiment and with many experiments, you already know before running any study that you can get results that support both sides.
[00:50:34] You don't. That's not Tom. That's not YOL's point. I know. But isn't that your point, YOL? That if you design a study a certain way, you can get results that will support this general claim and you can also get and if you design it a different
[00:50:51] way, it'll support the opposite of that. Right. So that's what he says about verbal overshadowing specifically. Other people have made that argument more generally. So like McGuire, Bill McGuire, who's like super famous now passed away. Social psychologist said exactly this.
[00:51:07] Like basically that a sufficiently good experiment or can design an experiment to demonstrate support for a claim in its opposite. I don't think Yarkoni goes that far. He's like he's saying specifically to verbal overshadowing from first principles. Sometimes it has to happen. Sometimes it doesn't.
[00:51:21] It might be that that's wrong. It might be that prior to this verbal overshadowing research, like cognitive researchers, people studying memory would have told you, no, rehearsing should always make people better. And I agree if that is what most people thought, then the existence
[00:51:36] proof that it doesn't always make people better. That's super valuable, but it really hinges on is that really what was believed? And that's I don't know enough about the area to say. Well, you don't even have to make a claim that if that was what most people
[00:51:48] believe to know that the value might be that if they believe that, they've shown an instance in which that's not the case. And that's that is certainly a piece of information that's that's valuable. Well, not if you could have known it before running the study, which
[00:52:03] but you couldn't have known it. I don't know how you think you would know that. But you could have for verbal overshadowing, right? How could you have known that though? This is the claim in the paper that it just comes from first principles that
[00:52:15] you have. That's that's the hand waviest argument I've ever heard. What first principles are telling you that verbal overshadowing is an effect? I don't know, you seemed convinced by this. I thought that argument was plausible.
[00:52:26] So like, I think that you can from first principles of how the mind works. You can say like, it seems extremely likely that we would get. Effects that go in different directions. And I think that like, in some cases, that's more reasonable than others.
[00:52:42] And you have to be like a kind of a content expert to know in which cases that's true or not. I sort of trusted him there that, you know, I mean, the logic seemed sound. I don't know, maybe some other like premises are actually wrong
[00:52:55] and a cognitive psychologist would be like, no, not at all. But I think there's a lot of domains in which it almost has to be true that sometimes a thing happens. Sometimes it doesn't. Sometimes you get like the opposite.
[00:53:10] Tim, I can't think of any better episode, aside from the last time we talked about psychology and science to mention our other sponsor for this episode, which is prolific. If you want to test a hypothesis, you can be specific. It could be broad.
[00:53:25] You can generalize or you cannot. You need quality. You can you're going to need quality data. So I don't think that there's any there's any debate there. If you even if you want to do purely descriptive research
[00:53:37] and bring health to the field of psychology by by first doing some observations, you want to know that you can rely on the data source that you get it from. So suppose I want to test a hypothesis that young people and old
[00:53:50] people differ in how well they process information about politics. Right. So with prolific, what you can do is get one of their samples. You can even pre-screen at no extra charge and you can pre-select a population of older individuals and a population of younger individuals.
[00:54:09] You can look at conservatives, liberals. You can look at African Americans or regular old white folks. You can look at young old. You can look at students. You have any number of demographic factors that you can pre-select
[00:54:22] in order to do whatever hypothesis testing or descriptive research that you want. One of the things that you don't have to worry about so much is the risk of professional test takers, professional survey takers who have kind of overrun some of the other services.
[00:54:37] Prolific takes steps to distribute studies across all participants so you don't run into those problems with professional survey takers to the same extent you see on places like M-Turk. The folks at prolific use machine learning to improve the quality of their data
[00:54:50] and monitor their data and any feedback that they get from researchers closely. So they also avoid the problem with bots. They also pay more than M-Turk. They keep their survey takers happy and engaged. You can also engage in more complex forms of data of experimental design
[00:55:10] by doing longitudinal or follow-up studies with participants with prolific. The attrition rates across participants are fairly low compared to other services. And prolific has just launched a brand new tool that lets you collect samples that are nationally representative of the US or UK.
[00:55:28] So if you really want to make one of those generalizable arguments, then at least it's worth taking the extra time and effort to get samples that are nationally representative. So there is an offer for our listeners. For this year, prolific really wants to reach out to you,
[00:55:44] whether you're a social scientist doing research, someone in charge of market research at a big firm, even just a high school student who's looking to do a science project for a fair and you want to do something on psychology. Prolific is giving away $50 to very bad
[00:55:58] wizard listeners who want to give online sampling ago. Redeem your free credit here at prolific.co.com slash Very Bad Wizards. Again, that's $50 to Very Bad Wizards listeners who are starting a new account prolific.co.com slash Very Bad Wizards. Thanks to prolific for sponsoring this episode.
[00:56:19] Very Bad Wizards. Even if we set this aside, the the Lakin's objection, which is that science doesn't work by induction. It works by this hypothetical deduction model or theory. That idea is that you come up with a theory.
[00:56:38] It's not that you run an experiment to test some lay intuition. You're supposed to come up with a theory. That theory generates hypotheses and and out of that hypothesis, you can get a prediction and for it to be valuable,
[00:56:54] it should be a surprising prediction that confirms your theory. What? No, no. Wait, what? Yeah, well, yes, though, that is how that's supposed to work. But it has to be surprising and confirm. If nobody has the theory in the first place
[00:57:10] and you come up with an experiment that falsifies a hypothetical theory, maybe that's valuable. But that's not how that method is supposed to work. The method is supposed to work. You gather information, you gather data
[00:57:24] and you come up with the theory first and then you test the theory. And if it survives the test, then it gets to live on another day. If it doesn't survive the test, then it's falsified. That's how this is supposed to work.
[00:57:36] But what you're saying is you take something that isn't even really a theory that anybody has put forth explicitly and you falsify it. No, that's not what I'm saying at all. First of all, you have like what when you say theory, come up with a theory,
[00:57:53] it can be a very constrained prediction that you're deriving from what you know. So you have a general understanding, right? You have a general theoretical understanding of how memory works. You think that rehearsal ought to improve memory, which is definitely a claim that memory researchers made.
[00:58:11] Like this is this is not controversial at all. This is this was the bread and butter of memory researchers. And then you design a study to test that theory under these conditions and you show that at least under these conditions, that claim has been falsified. What's wrong?
[00:58:29] Like what how is that not following the exact hypothetical deducto approach that that is being described by Lichens? Well, we disagree that somebody had a theory that would be so stringent and so general that they thought that a single study
[00:58:46] that showed results going in the other direction would falsify it. Now, maybe they did. I certainly like if you want to email me a paper where it shows that they had that somebody had that theory, that's fine.
[00:59:00] I am also I was also convinced that any reasonable person would think that the results could come out in both directions. So there there was no theory that said it will always turn out this one way.
[00:59:14] It could be useful to figure out the cases in which it doesn't, right? So it might be like the theory instead says like generally rehearsing should should improve your memory and now a schooler at all are coming up with at least one example of when it doesn't.
[00:59:29] Right. And so if then if it's part of a broader like a larger program of research where they're like, OK, well, what are the cases in which it does or it doesn't, then we start to have like a more of a
[00:59:39] like a set of rules for when does the thing happen versus not. Right. The feeling that I get is that oftentimes, especially these sort of more newsworthy psychology studies, they just like stop with the disconfirming and then that's kind of it.
[00:59:56] Right. They're like, oh, look, we found an exception to the general rule. Surprise. And then they go off and do something else. Well, I mean, this certainly happened with with JDM stuff, right? Where where Conor Mintowersky have this, right?
[01:00:10] So there is and you can say that they straw man rational choice theory in that might be the case. But but those results were interesting given that people did seem to believe that, you know, humans were maximizers or they were making these rational choices.
[01:00:27] They came up not only with a set of experiments to show this exception, but with a broad general theory under what conditions you would expect these exceptions to occur and to say that that wasn't science that was testing a theory is just would be ignorance of the theory
[01:00:43] that they claimed that they were testing. And that's an interesting case. Yeah. And to speak specifically to the verbal overshadowing. Yes, that's exactly what theory said, right? And like go read the original verbal overshadowing paper.
[01:00:55] They'll put it in those terms like you will they will cite it there. Like I just don't believe that Tamela, you will believe that even if you read it. Well, it's sort of hard to believe some of the rational choice theorists that they held a position that strong,
[01:01:11] just given what we know of human nature. They didn't think that they just thought that those biases were small enough to ignore that they would wash out. Yeah. Yeah. And then a lot of the follow up work
[01:01:23] tried to show that, you know, people had these biases when it mattered, like when they were incentivized, when they had a lot of experience, stuff like that. So I mean, I'm sympathetic to the idea of like, yeah, you got to start somewhere, right?
[01:01:36] And I think JDM is a good example of then like really building out that program of research to be very specific about what should it happen, when should it not to even try and make point predictions about people's judgments, right?
[01:01:48] So that's kind of an example of like what your Coney thinks that we should be doing. I heard you guys talk about this trash talking paper. And I do think that like they're, yeah, probably in the GD, they general
[01:02:01] discussion for those of those who aren't in the know. General discussion. Sorry. I do a nerd podcast. I do a nerd podcast. I don't I'm not used to talking to the normals in the general discussion. I'm sure they talked some about limitations.
[01:02:15] But like that's a paper in which you're not starting with like a strong theoretical prior. You're like, here's a phenomenon that we're curious about. We're going to instantiate it in a certain way. And then we're going to like it almost like why would you do the study
[01:02:27] if it were only about the specific circumstances, that specific design? You want to say something broader about the phenomenon people are interested in? So you can do it wrong. So does that mean that there's a generalizability crisis though?
[01:02:38] Like because like it's not that there's not instances of people actually mistakenly making inferences about their data, right? And lay people doing that once they read those shitty descriptions of the data. It's not just that they might have overstepped in the general discussion. They weren't using that method.
[01:02:56] There was no theory that they were trying to falsify. If at best you can say that they had a theory that they were trying to confirm, which is something along the lines of people are motivated by there was some mediator between trash talking and motivation,
[01:03:18] hostile outgroups or something like that. And then this was a test of that theory that they tried. So they weren't trying to falsify something. They were trying to confirm. I mean, again, at best their own theory.
[01:03:32] No, I think that to be the fairest to them and I, you know, this will, I guess turn on how we're interpreting what they're saying is I think that they took the lay intuition that people might have the trash talking is supposed to fuck with somebody's performance.
[01:03:46] And they showed that at least under these conditions, trash talking seems to improve other people's performance. That the theoretical claim that they were testing, the verbal claim, was what is what is probably widely believed if you just asked anybody
[01:04:01] who trash talks, what they thought that effect would be. But I guess I just don't think that's how this model is supposed to work. You are supposed to collect data and then actually construct a theory and test it.
[01:04:13] You are not supposed to imagine what some people might think about something and test that that sounds more like induction, but you don't have to collect data before you construct a theory. No, you don't have to.
[01:04:25] But you at least have to construct a theory and not just imagine what some other people might believe is a theory. When you're doing this, you're supposed to have an idea in mind about what it is that you think is true and then subject that to tests.
[01:04:44] And if nobody's done that, then testing it is is I mean, it might be valuable, but it's not the it's not the way this model is this approach is supposed to work, as I understand it. But I think if you're charitable and you think that this is right.
[01:04:58] So when you say when we say theory, it doesn't have to be capital to you. You don't have to have a whole worked out view of how motivation works or anything like that. You can, you know, I think that that it's fair to say that the trash talkers
[01:05:08] were testing the claim, the theoretical claim that trash talking would demotivate and they falsified it. Conor Mintewersky were testing the claim that people are rational maximizers and they found evidence that they didn't. The verbal overshadowing people were testing out a claim
[01:05:26] that rehearsal and verbal repetition of visual memory would improve memory. A claim that definitely had been made and they found evidence against it. But all your examples, you're pretending that they're falsifying something. The way science is supposed to progress is that people make theories
[01:05:42] that they with surprising predictions that are confirmed. Right? That's how this model is. But the model is science. Not at all. Yes. The model here is physics and you have a theory and the way that we are supposed to become more confident
[01:05:57] in these theories is by testing them. And in every one of your examples, the way you say that these people are doing their studies, it's like they're trying to falsify something. Tamler, that is what I'm that is what Daniel Lachens and I are defending.
[01:06:10] Falsification is exactly what he means when he's talking about the deductive approach. But there has to be a theory out there that you are falsifying. Yes, there's a theory. There's a theory. What do you want from what do you want other than the claim
[01:06:22] that verbal rehearsal is something that would improve memory? That's a theoretical claim that's being tested. You can't test like a huge theory of how the mind works with one specific experiment. You have to constrain your theoretical prediction to a hypothesis.
[01:06:38] And that hypothesis is the local, the local thing that you're testing. Your reason for believing that the theory is true has to just be greater than, oh, a lot of people seem to think this. Why? Otherwise, you're just testing random things that people might think.
[01:06:54] Like that's not science. That's not building. That's you have a specific hypothesis that you generate. You're the one who is just defending the Yarkoni claim that from first principles, we would know everything. So take a first principle. No, I wasn't.
[01:07:07] Take the first principle approach, generate a prediction and test it. It doesn't have to be testing intuitions, testing lay beliefs, cultural beliefs. Those are all fair game in the scientific method. Right? You want to test the general belief that that bowling ball falls
[01:07:26] faster than a feather. You test that, create a vacuum and you test it. I mean, I guess the question is whether you end up with anything satisfying if you just go around testing whether label beliefs are accurate.
[01:07:37] Right? Ideally, you would want to tie it back into a larger picture that puts these things into context, which is where I think Tamler, where you were going with physics is a theory that actually can make point predictions, right? That says we ought to see such and such
[01:07:50] deviation in the orbit of this planet. If my theory of gravitation is correct and then you observe it and you see, like, do you see exactly that deviation? That seems different, right? If the discussion is turning on how how how much more precise physics
[01:08:07] is and how much more precise their theories are, then like there's no you're not going to get any disagreement from me. But to say that you're not theory testing and false to one claim that that confirmation is the is the way in which
[01:08:22] lakens or I or anybody is who's doing the science thinks that it's proceeding is wrong, right? It is falsification. And two, if you just say if you think it's a shitty theory, then you think it's a shitty theory, then fine.
[01:08:34] But the fact that you can generate a hypothesis from that broad statement is the way that it's supposed to be proceeding. And that's what people are doing. But then the goal is to generate theories that can survive falsification, at least, you know, for a while.
[01:08:50] The goal, if you're trying to understand the human mind and not just trying to just repeatedly show that people have no idea how the human mind works is to build something where you can actually make some sort of surprising prediction.
[01:09:05] When I say this is how it's supposed to progress, this is how it's supposed to progress. You get a theory that generates hypotheses that then survive testing, they survive, they make predictions that that are confirmed. That's super different than saying that that it proceeds
[01:09:19] by by running confirmatory studies. It's valuable to falsify a theory and to find out that you didn't know something that you thought you knew, but that just puts us back in a state of not knowing anything about the phenomenon in question.
[01:09:37] And so for it to progress like for us to learn about the planets and how they end and the solar system, we had to actually come up with theories that were that generated predictions that were confirmed.
[01:09:49] Yeah, like I actually think that like I am as I've said over and over again on this podcast is that one of my fears is that confirmation bias is rampant and that when we think that we're properly testing a hypothesis, we're not actually doing it.
[01:10:05] That actually is a deep concern. So I think you're still not fully grasping what I'm saying. So let me go something Joel said on his podcast. Joel, you can confirm if I understood this right or he can falsify or he can falsify it.
[01:10:20] Yes, you confirm or falsify either is valuable. When you were talking about the specific objection from Lachans, you asked Mickey, is there even a single theory in social psychology that is something that will generate surprising predictions that are that are confirmed?
[01:10:38] Can you think of a single one and Mickey said, no, I can't. That's very different than these other sciences where the goal is to come up with theories that actually can generate surprising predictions and they don't start there to like, oh, you know, some people think that Mars
[01:10:59] rotates like this, but we ran a study that shows that it doesn't. They actually come up with a model of how the solar system and the planetary motion is supposed to work that and then they test that theory and and get something that actually works for a while.
[01:11:17] If you don't make that step where you actually come up with theories that can survive falsification, that's that's a problem. And I take it that however you want to interpret what scientists are doing, your acony is diagnosing that as a problem and saying something along
[01:11:36] the lines of, you know, correct me if I'm wrong, that that's the thing that will be virtually impossible to do with the current methods. Yeah, I mean, I think what I was getting at when I said that to Mickey is, is it productive for us to jam
[01:11:55] our empirical research into this falsificationist framework that maybe arguably is only well suited to more mature disciplines that can make more specific predictions and where the auxiliary assumptions are really well understood. And I think maybe that we're at a
[01:12:15] point now where we just don't understand enough of what's going on for that framework to even be useful. So you can go around, you know, falsifying a bunch of claims that could come from some reading of some theory or that
[01:12:31] could come from intuition. But does that add up to something useful? Is that like an end that we should be pursuing, even if it's like formally meeting the like poppers criteria of falsification? Like maybe that doesn't the point always is to accumulate knowledge,
[01:12:45] right? And to ultimately, I think be able to understand and predict. And if it doesn't add up to that, then then why do it? And I guess I'm perhaps more negative than David is about our ability right now to do
[01:12:59] that about about people's behavior. So I think it's maybe more productive to do descriptive work and then do like really kind of focused studies on exactly the thing that we care about. So like getting back to the JDM stuff, which is a literature that I
[01:13:13] like love and think is like quite a strong literature, you take that stuff into the field, David, you know, like half the time it doesn't work, right? It's like loss gain versus loss framing. Like we know that there's a ton of research showing that
[01:13:25] like people attend more to losses than to equivalent gains. And yet you do the field study where you frame something as a gain or a loss and it's like, well, you know, sometimes it works. Sometimes it doesn't if it doesn't work,
[01:13:35] God knows why not. Right? So that's kind of I feel like the stage that we're at. And if you're at that stage, then maybe it's best to say, oh, whoa, let's take a step back from this like super formal hypothetical deductive framework and just say
[01:13:50] like, let's see what's out there. Let's try and describe that accurately. And let's say like in a very specific situation, what effects might we be able to generate by changing this or that thing, which sort of gets around your Coney's objection because then we're trying to just
[01:14:05] saying like, well, if you contact people by direct mail and you're trying to get them to sign up for this program to lower their energy bill, then this framing works better than that frame. And then you might be like, well, maybe if we have a similar program
[01:14:18] somewhere else, that would also be a thing to try but no guarantees, which I feel like if we're being like realistic about where we're at, that is where we're at. Right? Like that is that's what you get from field studies is the thing that you
[01:14:30] really thought should work. It's like, well, didn't work. Don't know why not. Look, like I ended that the last discussion we had on this by saying that oftentimes the only thing we can learn from some studies is exactly under what those conditions, what will happen to those people.
[01:14:44] Right? And that's why those might be the most valuable. But this is not Yarkoni's objection. Like Yarkoni isn't making this broader critique. Yarkoni is saying that the experiments that we do, he thinks we're in a crisis of generalizability. He's not making these other critiques
[01:15:03] of the science that we've talked to about at length, right? Like I actually think that look, it could be that the falsifiable prediction of our local theories are giving us results that don't feed into the predictability of human behavior outside of the lab.
[01:15:22] Like that could very well be the case. But what Yarkoni is saying is that he thinks that we're doing experiments in order to generalize from the results of that experiment. And that's a different critique. We are at an early stage. And again, like I think that the descriptive
[01:15:40] research that these other sciences went through is something that we've like skipped. Right? And so all of that would yield better theories and those theories could then be tested properly with the falsifiable experimental paradigm. It's so we could be wrong in what we're trying to falsify.
[01:16:00] But that's a different that's a very different argument than what Yarkoni is making. Yeah, so I feel like this is then difficult to talk about because like he would say, look, people are making these very broad claims in the title and the abstract in the general discussion.
[01:16:13] And they're like, you know, footnote, footnote, caveat in one paragraph of the GD. And then other people would say, no, of course it's understood that we just meant under these circumstances and future research has to blah, blah, blah. And it's like you kind of bogged down and.
[01:16:25] Yeah, well, let's take a descriptive paper. Right? Let's take a purely descriptive paper. This is not gaining us anything in terms of generalizability. Like say that I observe a bunch of kids in the playground to see how much they cooperate. This is not buying us anything
[01:16:41] in terms of generalizability. At least you're being honest about what you're doing. The descriptive doesn't get you out of this. Right, right. You might say like descriptive by its nature makes it more salient to the reader that this is like local to this specific context, whereas
[01:16:54] experiments are by design abstracted and part of what people take away from that rightly or wrongly is that it applies more broadly. So I this ties in like kind of interestingly to a piece that I wrote recently but that isn't available yet
[01:17:09] just to like kind of self promote in advance about stereotype threat where you know stereotype threat researchers they wrote an amicus brief in Fisher v. Texas I think saying it's been shown that stereotype threat has these wide ranging and pernicious effects on the performance of minorities.
[01:17:27] And that may be true or not but what they're basing that on is these very specific lab studies these lab studies that have been designed to elicit stereotype threat effects. And in fact when you do bigger field experiments you often get really mixed or null results.
[01:17:42] But you know one big problem which I'm interested in what you think about one big problem was that those studies themselves weren't showing what they thought they were showing. Had they been then maybe maybe we would have evidence of the pernicious effects because under the
[01:17:56] control conditions of test taking a standardized test with a stereotype threat I think you would be testing that in fact you would have evidence that might lead you to an amicus brief they just false positive the shit out of that data.
[01:18:10] Well so even if you can get it in the lab that doesn't mean that it translates to an actual high stakes testing situation. That's my point that's an inferential leap right that you you have to give evidence for that I think. Yeah but like I would believe that
[01:18:23] under the conditions of bringing in college students to take GRE questions in a quiet room where you have them fill out their race and ethnicity or gender that that is really close like I would put like money on that that is exactly the kind of condition
[01:18:42] that I would be more comfortable inferring what would happen under those strict test taking conditions. So had they actually had a body of evidence right they just this was selective reporting controlling for tons of shit. This was like a but Dave when you say
[01:18:57] I would put money on that like is that like a face validity kind of thing that seems like a relevantly similar college students taking GRE scores in a quiet classroom asking for demographic information and like in under conditions where they know they're being evaluated.
[01:19:15] I think that is like much closer to the conditions that we're trying to predict right. It's not perfect it's not the same but it is pretty damn close. I would say that market would you know I would put money on any study that tried its best
[01:19:32] right because the population of questions that are being evaluated with like we have access to like a bunch of GRE questions we have access to a bunch of SAT questions. So I don't think it's that much of a leap it's still I don't know
[01:19:43] that I would ever file an amicus brief about it but but I think that the huge problem with stereotype threat wasn't that they weren't proceeding you know be it like that they were making errors in generalizability. I think if you believed all the
[01:19:57] data that they said they had it wouldn't be that much of a leap. I don't know I mean I guess and now we're back to dueling intuitions right. I feel like one thing that's super different is the stakes are a lot lower. But if you could have like 10
[01:20:10] 10 things different and one thing different like you're you're right like I'm not going to die on this hill but but like I think that wouldn't you be more confident it had all of like if it were a real finding like they weren't even finding that really in those
[01:20:22] conditions. The argument that I try and make in the paper is that going to the lab first is is the problem that what you should do first is try to observe the real world thing that you care about especially in a case
[01:20:34] like this where then we don't have to make these guesses about like well does this mirror the real world accurately or not like you should be able to see some evidence of this in real world situations. So for example if stereotype threat really does depress the performance
[01:20:50] of let's say racial minorities or women then the test should be differentially predictive for different groups. So they should systematically under predict the performance of racial minorities let's say. So that that is apparently observational right but that would like make you more confident than about then going
[01:21:12] into the lab and saying like okay what's the mechanism here right. And the fact that they started with the experiments where you are just by nature free to engineer a situation like maybe it has to do with the choice of items right that
[01:21:24] that later became clear that they were like well they have to be items that are difficult but they're not so difficult that that you know you're at floor and all of this sort of seems reasonable and like I don't know how much of
[01:21:36] it is like just post-hoc kind of like p-hacky sort of stuff but like let's say it's 100% replicable and they're like well it only works for the moderately difficult items well is that representative like you know is the the content that they encounter on the actual high stakes test
[01:21:52] similar to that you know maybe not I mean they chose those items specifically because they were like oh these are the items that work right so again you're that that is a question of generalizing. Yeah but this is a nice example because you can get old GREs and
[01:22:04] old SATs and you can actually use those and so if it were only the moderately difficult items you could calculate the bottom line effect on these on these students. I think that like all of the other problems like all the other reforms
[01:22:14] that we've been working on which is are about p-hacking, pre-registering, cutting down on on research degrees of freedom, transparency and reporting, file drawer problems, all of those which are separate issues could have done so much to prevent us from ever making the leap
[01:22:31] that this is actually a pernicious like problem in in standardized testing because we wouldn't have found those things. Can I just ask Claire for a question for Dave? So how do you understand what they were doing there? Were they trying to confirm their own theory about stereotype threat
[01:22:48] or were they trying to falsify a theory that there is no stereotype threat or something like how do you how do you conceive of what their intent was here? I mean I think they were testing a theory that's that said if I you know if I recall correctly
[01:23:08] they're they're testing a theory that stereotypes generate a particular like they're priming a particular set of beliefs that are threatening and that could have an impact on on tests and so I think that they were trying to confirm that right which is probably not the best
[01:23:27] way to go about doing this. So then here's what I take to be Yarkoni's point there is if you frame it like that like I have a theory and I am testing it in this way the problem with that is the results of that test
[01:23:44] and in this case provided provide so little confirmation that it just doesn't provide any confirmation. You know what they found was they they found falsify like they falsified their theoretical claim and didn't report it right which is just a problem of dishonesty and that's the falsification that I
[01:24:03] think should be yeah. That's fraud but let's say they had done it. Well you know I don't want to call it fraud like okay sorry. Yeah sorry it's just bad bad practice. It's confirmation confirmation bias. Bad practices. Best case scenario every good practices like the best psychology
[01:24:19] practices that are existent right now isn't the point that it still doesn't really get you much confirmation because it's something you would expect to turn out different ways depending on the situation anyway and so the fact that you can find one case
[01:24:36] of it isn't now and I take Lakin's point is well that's just how the shit works. You have a theory and you test it until it's falsified and then you discard it but but like the way this is conceptualized is if we can just get this one test
[01:24:53] then this theory is pretty solid like we've established a phenomenon. I mean I think so this is what this is a point where where maybe Lakin's is being too strong about his claim so like the descriptive claim about are psychologists doing the hypothetical deductive falsification are they using
[01:25:16] that method well if what you mean by are they using that method is do they seem to believe that that's what they're going about doing maybe he's wrong maybe that actually isn't like how psychologists are seeing that and maybe you could actually do
[01:25:30] coding of a bunch of like you know intros and discussions and see whether or not they view what they're doing as proceeding via falsifiability and it might be the case that no most people are actually doing it wrong by seeking confirmation for but if you're doing it right
[01:25:47] if you're if in fact you are proceeding with the attempt at creating specific conditions to test the falsifiable claim then generalizability is not in the way that Yarkoni is concerned about is not a problem because you wouldn't ever think that that's what we were trying
[01:26:04] to do in the first place right so it's like more of a sociological descriptive point maybe yeah but if we if that's what experiments are understood to do like falsifying then it doesn't matter that they're under constrained conditions I just don't I don't feel like that's
[01:26:21] often what we're doing or or we're sort of having our cake and eating it in that like psychologists are happy to be quoted talking about current events right they write pop books they go on these speaking tours they give people life advice and it's not like well
[01:26:35] we found that in this one specific situation we failed to disconfirm that whatever right it's like no to be happier pet your dog more or whatever yeah so I get that like you and Tamler like you're making this I think a good point which is
[01:26:51] something that I wasn't granting at the beginning which is that the way in which we are talking about our own science probably is irresponsible but then let's separate the way that we think we're like the knowledge that we think we've accumulated which is all this generalizable stuff
[01:27:08] from whether or not we have learned anything from our experiments and if you view what we're doing with our experiments as trying to generate hypotheses that can then be falsified then you are not under this danger of generalizability right so it boils down to a rhetorical claim
[01:27:27] it's not that the data that we're generating is useless it's rather that when we don't realize what the nature of the data we're generating is then we irresponsibly make generalizable claims can I ask a question on this front so one of the things that Yarkoni said
[01:27:49] in response to this kind of objection the Daniel Lekens objection is it would render p-values and a lot of statistics it would turn that into something that is purely rhetorical it's window dressing it doesn't actually mean what it's supposed to mean because either your prediction was
[01:28:12] it was confirmed or it was falsified but the p-value part of it doesn't doesn't actually matter except that it might make it seem I don't know make it look better but I don't fully understand that can someone explain I don't think that's right now I haven't read this
[01:28:30] Twitter back and forth and I'm totally unknowledgeable about what was actually said here but you know the reason you want the p-value is to see like well were the differences that you observed expected under chance alone right that's what the p-value tells you and I think that
[01:28:47] Yarkoni's point was when we try to go beyond what the model test in our verbal claims then the p-value becomes irrelevant because the verbal claims don't match up with a model anymore and then again we're at this disagreement about like what are the verbal claims people are making
[01:29:03] right and as long as you restrict yourself to what we tried to do here was test one prediction of the theory which says that under all circumstances this should happen and we found that with these specific stimuli it did not happen and therefore or I guess vice versa
[01:29:18] our theory says that in this case this should happen and it did happen and therefore our theory is supported you know if you if you're narrow enough about your claims then the p-value is meaningful on its own terms although the issue is with a positive result
[01:29:34] that then you have the problem of like like I'm not sure that inference is warranted right if your theory predicted x but lots of other things predicted x as well should that make you a lot more confident in your theory probably not right
[01:29:46] so like that's a whole other like can of worms but it and how do you determine what x is it's not like you're predicting the exact ratio of or or the exact number on the Likert scale that's averaged your prediction has to involve some like some threshold
[01:30:05] that we can call the prediction confirmed and some threshold where you can call it falsified right right yeah so you would you and maybe you would predict a rank order or you would predict that the point estimate is going to be in this zone
[01:30:20] you can test all of those using like a you know p-value it's kind of like the classic statistics that we all learn in grad school as opposed to like fancier patient stats they can tell you that they can tell you like well is this significantly higher than that
[01:30:36] or does this this value that we observe differ significantly from this other value that we're positing for example is it in this range it can tell you that so my understanding of the way in which we're using significant testing significant testing is that it really does
[01:30:51] have to be tied to a particular way of making inferences about what you're doing and that way has to be something like you conduct an experiment and you know you do your best to say right I'll just pull an example right of a memory study where you say
[01:31:06] I'm going to give you 20 words to memorize and I'm going to create two groups one is a group where I just give them 20 words to memorize and then I assess their memory a bit later and another one is cold presser task where I give you 20 words to memorize
[01:31:23] but before I do that I make you stick your hand in ice cold water right I have a theory that says that when you're aroused emotionally you should have better memory and so you are now using the p-value when you get those two numbers
[01:31:40] how many words did they remember in the experimental group and how many words did they remember in the control group I'm testing whether or not in a world where I am wrong right in a world in which this makes no difference what are the chances
[01:31:54] that I would have found this particular finding right what are the chances that my experimental group would have yielded a number that's this high and what that's telling me is well chance at this arbitrary level that we've said it at it would be less than 5% chance
[01:32:10] that that would happen right so unless you tie that's unless you tie the significance testing to this particular way of going about doing your science then yeah I can't what it cannot at all tell you is how many people in the world are going to member things more
[01:32:27] when they're emotionally aroused like they can't tell you that and I think tolls point is that the p-value is tied to those specific things unless you specifically model variation in them right so it's tied to the words that you used and the test that you used
[01:32:40] and all of that stuff so if your approach is that you're comfortable saying that the results of this very constrained test are going to yield information about whether my theory is true or not then that's not a problem right but if you are going about
[01:32:54] if you're trying to say and I think that everybody in the world who's emotionally aroused will remember any piece of information better and emotionally aroused in any way or whatever like that then you are doing it wrong you're you should not be inferring from this particular experimental methodology
[01:33:10] and this particular way of statistical testing that this is going to be generalizable I think it's a good cautionary tale about reminding us what we shouldn't be able to infer a question is is are we in a crisis about that maybe we are because we keep talking
[01:33:26] so irresponsibly about our findings maybe that's the crisis but I don't think it's a crisis in the methodology all right now let's take a moment to thank one of our favorite sponsors Givewell the organization who researchers charity to help you maximize the impact of your donation
[01:33:46] Dave we've been doing spots for Givewell for a while now and this past season podcast listeners like you gave over $500,000 to Givewell's recommended charity and Givewell has asked us to thank all of you for helping to support some of the most effective charities in the world
[01:34:06] Givewell spends more than 10,000 hours each year searching for outstanding charities but that only matters when donors like you act on their research and give and you know just because it's not the holiday season anymore that doesn't mean Peter Singer's argument is all of a sudden unsound
[01:34:25] wait it's all it's actually it's relevant across the year it's right it's sorry yeah it's not just November and December you can still donate and your donation will make a big impact you can be sure of that how can you have that confidence well Givewell conducts in-depth investigations
[01:34:43] to find charities that dollar for dollar are saving or improving lives the most these donations will be used to distribute things like malaria treatments insecticide treated bed nets or vitamin A supplements programs that can save a life for every few thousand dollars donated Givewell uses academic research
[01:35:05] interviews with charity representatives and site visits to estimate which charities can give donors the biggest bang for their buck they keep their recommendations up to date to make sure that their recommended charities can still use additional funds effectively that means that donations at any point in the year
[01:35:23] including now will be put to good use so to find out how much good your donation can do go to Givewell.org there you'll find all of Givewell's research for free as well as a short list of the most effective charities they've found
[01:35:39] you can donate directly through their website and they charge no fees and take no cut thank you as always to Givewell for sponsoring this episode I have two other questions on my end the first is he has this section where you know what should we do about this
[01:36:02] one of them is just do something else don't be a psychologist so set that aside as you guys said like that's not going to happen for tenured psychologists well clearly I'm a podcaster now I have to spend more time yeah it's true you at least have an out
[01:36:17] both of you have an out actually so that's good for you bad for these other poor psychologists who don't have podcasts but then except that they're all like really successful because they spend all their time podcast so another option was to do more qualitative research and
[01:36:42] you know this is something Dave you and I have talked about a lot that and I think even you agree despite your highly defensive stance on this episode you mean my rationality and not just laying down you know I feel like Mickey and Yol
[01:36:56] laid down a little bit early on this in their in their episode they're like yeah he's right our science is not a science well they had been softened by that other one that book that you guys read for the against experience yeah that's right
[01:37:09] so those those episodes are sort of like they're a pair almost right so we went in kind of skeptical more than we would have been otherwise but then he says do more qualitative research great I love that you know it has a kind of humanistic vibe to it
[01:37:24] and then he says and this is what I want to understand better than I do that actually a lot of these experiments are just qualitative research but then with a lot of numbers thrown in because to make it look sciency which of course is something that
[01:37:40] appeals to me if it's true I just don't fully understand what he means by that yo I'll take this because I actually that's I think that's a good point like hey we've been doing qualitative research this whole time just pretending it's quantitative so what's qualitative research
[01:37:58] is just describing hey under some circumstances people do this right I went to a park I saw people do that and I think what he means by that is that when the statistics don't actually test the verbal claims that are being made then what you're doing
[01:38:17] is just making some verbal claims and the statistics are sort of irrelevant because they're testing a question that you're not actually asking so then all the flaws of qualitative research are still there except there's pretense that it's not that's what he's saying yeah
[01:38:33] good and if you both agree with that that's awesome the second question I had is about the current emphasis on replicability which is something that he talks about too and something that I've thought a lot about if he's right or even close to right somewhere
[01:38:53] if he's as right as yoel thinks he is is this emphasis on replication and and replicability is that a misguided effort and so the analogy I was thinking of is if you have say I don't know like engineers are working on an airplane
[01:39:16] and the engineers are working on the parts and the parts that they're working on are only useful for that airplane but the airplane has this massive design flaw it won't allow it to fly and so the engineers now are spending all their time working on perfecting the parts
[01:39:34] but no matter how good they can get the parts working the airplane will never fly and so they're kind of wasting their time in doing that because the only worth or value of those parts are if the sorry if the airplane is designed well enough
[01:39:53] for it to work in the first place does that make sense does that analogy make any sense yeah so yoel I want I'm going to give yoel a chance to try because they talked about this very thing on their podcast about the value of
[01:40:05] there is an ironic twist that that Yarkoni's arguing but I don't even know that you need as much as I love your metaphor Tamler I think it can easily be said without the metaphor that if it was a worthless experiment at time one
[01:40:19] it's going to be a worthless experiment at time two right it doesn't matter that you've repeated exactly that is a more concise way express your point yes thank you yeah so he comes out like surprisingly on as if you were if you had a like
[01:40:36] like place psychologists who you know on this continuum of how pro how bullish they are about direct replications and I would have I would have before this put Yarkoni as one of the people who would be in favor of this
[01:40:49] but as yoel and mickey pointed out like no this leads him to the complete opposite conclusion which is that softer what they could what we call conceptual replications or what matters the direct replications all you're doing is testing crap twice
[01:41:01] yeah so so he pushed back pretty hard on us about that and I I did he on Twitter yeah he was I forget what he called conceptual replications god now I'm afraid about towel towel please be nice to me on Twitter I'm not as smart as you
[01:41:13] no he didn't he doesn't listen to podcasts so we're fine and we're fine yeah he was he was intensely critical about conceptual replications so I think what he would say is is we need both and specifically we need designs that systematically vary
[01:41:28] the things that we think might matter and he cited in his paper a paper that I hadn't read that came out in P&A where they tried to do exactly that they were looking at a like a linguistic priming effect and they systematically varied a ton of different
[01:41:44] things about how it was set up and they tried to figure out well is there are there differences between like the things that we varied in terms of whether they matter or they don't for the effect and can that tell
[01:41:55] us something about why the effect is happening or not and I think that's what he would want for these big multi-site replications I think he thinks of it as a wasted opportunity to to not vary more factors about the design
[01:42:11] to see under what circumstances do you get it under what circumstances don't you so that's really different from conceptual replications where you're sort of in an ad hoc way being like well we tried changing this right it's like very systematically trying to
[01:42:23] change a bunch of things that you think might matter all at once to see you know what affects the results and what doesn't I'm not sure I understood the answer so so what's conceptual replication versus versus just regular replication right so you all and I did
[01:42:38] a study back in a few years ago with Tom Gillovich and Dan Ariely where we actually were interested in whether people will self harm like will people be punitive to themselves physically when they feel guilty about an act so
[01:42:54] we actually had people right about a time they did something bad and they were given the opportunity to shock themselves more severely and we showed that people who wrote about a time that they did something immoral were giving themselves bigger shock
[01:43:09] so now you can directly replicate if somebody doesn't believe the study which they might have grounds to not believe you you they would do exactly what we did use our materials use the shock machine that we used give the instructions that's the direct replication because
[01:43:25] because they want to know that we weren't p hacking that we you know that we didn't selectively grab all that so so a conceptual application would be like well let's use a different like they're making this claim in general about guilt so let's let's actually have somebody do
[01:43:39] something in the lab that they feel guilty about and then let's give them the opportunity to do another task that might bring them pain like stick their hand in ice cold water and see if it's longer so it's your point you'll you
[01:43:51] want to do the the first replication the sort of exact replication just to make sure that the effect was really there then you would want to do the conceptual replication to begin to in a tiny way address the problems that there could be different stimuli and there
[01:44:11] could be all this other kinds of of noise that are random and so you would want to kind of just move out of that quickly and try to use different stimuli and still get the same result is that the idea and that would be
[01:44:28] that part is the conceptual replication yeah sort of although I don't think he would I don't think he would like the term conceptual application for that because you know what he really wants people to do is to vary all these things within one experiment so you can
[01:44:43] statistically model how much variability is attributable to changing these different things right and if it's just like I run as one offs you know here's one experiment that does shock you would give yourself here's another experiment that does sticking your hand in ice water
[01:44:57] here's a third experiment that does you know how hard you pinch yourself that you can't then model you know how much variability is attributable to those different tasks it has to be kind of randomly assigned all within the same study why it's just the way that
[01:45:12] the model is set up you need to have other things equivalent basically so yeah you can't like estimate that variance correctly if you're doing it across other studies where other things are varying right the model has to be told what are what are the things that
[01:45:27] are changing and how specifically for each you know observation that you have in your data set so that's pretty different from how we've done conceptual replications which is more as this like one off and and they
[01:45:40] have other problems as well which is that if it you know is consistent with your hypothesis then you're like great theory confirmed theory supported I should say if it's inconsistent you're like hi must have done that wrong
[01:45:50] and you don't publish it right so it has a it more so than many other techniques kind of lends itself to this motivated reasoning of like well if the experiment fails and you don't really think of it as disconfirming
[01:46:02] because you can always be like well that was an invalid extension you just did it wrong right yeah so then do you think that about replication I think we're gonna get to a point where it becomes increasingly less useful to have very specific estimates of the um
[01:46:20] very constrained effects like constrained in terms of their generality yeah and I think now that we're doing these big multi-site studies we should start thinking about varying features of the study so that we can say more about
[01:46:34] how much does stuff vary across these changes that we might make and that lets us start thinking about like how much should we expect this to vary across contexts right now we there's really no way for
[01:46:45] us to estimate that in kind of a traditional design where you don't even vary the thing that that might be causing changes yeah not to get too inside baseball but something like stimulus sampling is a real problem right so so using this the a particular set of items
[01:47:00] like about moral judgment right I say this is a problem in moral psychology actually like you we use like should you know is it a moral to fuck a dead chicken right and people use just that over and over again
[01:47:11] and they make claims on moral judgment when in it would be nice in a in one study to systematically vary like have like I have Tamler have you well give me what they think are prototypical instances of
[01:47:25] of a moral violation you send them to me like we I collect a bunch of those from people who who don't know about the hypothesis and I systematically vary which ones my participants get right so that I'm not just testing the thing that keeps giving me results
[01:47:41] right so that I am being more objective so now I can actually give a quantitative estimate of how how is how generalizable is this effect to all moral judgments and we don't do that enough it's hard to do and when you do the math about like how many
[01:47:58] how many items it would take in order to like generate a decent estimate damn near like unfeasible to do but that's the kind of thing that I think he would want but I think that we like to get back to stereotype threat
[01:48:13] if we had had people doing direct replications we would have learned earlier on that we shouldn't take it seriously right yeah so that's why I think these things go hand in hand right like and you can't really you can't see anything meaningful about
[01:48:26] generalizing if they there's actually not exactly exactly like that it would just stop the like the train would stop because the only evidence that we have for the the support for this theory is actually nobody can replicate there's there's nothing there but the
[01:48:39] question is whether you should have taken it seriously in the first place whether the problem is further back and so yes this gave you the additional information that you shouldn't take it seriously because the effect wasn't real but to me that comes first right
[01:48:59] so like you can argue about like how broadly can you generalize and a critic might say well it's only for these items and for this specific subject population but if there's no effect then it's like the argument is moot right but if you are devoting resources to
[01:49:12] replicating that study you wouldn't want to do it if you didn't think even if it was replicated it wouldn't tell us anything in the first place right that's right even Yarkoni would think that it tells us something about those particular subjects in that particular experiment
[01:49:27] so so I think that that's actually a much a much more important fear to have which is that like many of the studies don't even show us the specific thing because because we've been doing it wrong
[01:49:40] like all this time so I think that like we we don't even have as a science like and maybe we should burn it down to the ground and start over again and maybe start with more descriptive research and then and then build up
[01:49:52] two experiments but we don't even have a good sense of which of the specific specific effects are actually real you know I can't like if I had to estimate all of the experiments that I've done like which ones of those are real like I
[01:50:06] admit to being a little like you know I kind of want people to go around replicating so this is funny like you brought up this guilt and shock study right um and that's when where I'm like boy we ran
[01:50:16] those people one at a time and I know that I did interim checks on the data and all of that stuff assume all that stuff is not an issue and this effect is like a hundred percent solid I'm still like now having read the
[01:50:26] Yarkoni paper and thinking about it I don't know that I would run that study because is it interesting really like you know yeah I see that's the reason I brought it up even to begin with is because when Tamler was asking
[01:50:34] his question I was thinking to myself well here is a case where we didn't have a full blown theory that we were trying to disconfirm but it was the case that in all of the literature on guilt there's a case where Roy Balmeiser who had
[01:50:49] written this exhaustive sort of lit review on guilt had said um I don't think that's that would happen like nowhere have I seen any evidence that guilt could lead to self harm directly and just that we observed it even under these constrained conditions I think is an
[01:51:08] interesting right this means that it can happen now when I run it because I may not run it because I'm not that interested in the topic but I still think it's the demonstration is a valuable one if it's true which yeah that's interesting I didn't remember that
[01:51:23] Balmeiser thing yeah I don't even know if we talked about it but I remember talking to him about it yeah the way that I thought about it at the time as best I can recollect is like oh this seems like this interesting phenomenon
[01:51:33] that you sort of maybe observe some examples of that in life you know like flagellants they you know they expiate their sins they beat themselves until they bleed let's see if we can get them in the lab I usually do that to masturbate not to expiate my sins
[01:51:48] right that's possibly a different psychology it amounts to the same thing right I'm just gonna go gonna go punish myself be back in five minutes anyway so so I think I was like oh it would be cool to like get that in the lab
[01:52:02] to show that like people actually do this under control conditions now I'm like is that cool like I don't know like is it cool just to demonstrate that you can make something happen in a lab that seems like sort of
[01:52:13] like somewhat intuitive if you like can point to examples in the real world that it does happen like I'm not sure like that that doesn't seem that interesting to me anymore so there's an interesting question to me which is
[01:52:23] now like is there a value like if I approach it post-hoc if I approach it as a demonstration that a particular view of guilt was wrong does it have value yeah I think so yeah I think if you're like look
[01:52:37] Roy biomeister who's important the way you are thinking about guilt is actually incorrect we can give you evidence that will cause you to update your beliefs then yeah that's useful I think right and then at least under some real specific conditions we're showing that people
[01:52:48] are giving them right because he was saying I've never seen that happen right so then if you're like that's a pretty strong claim and then the existence proof is is useful whether it's valuable or not is one thing and maybe it is I show that
[01:53:02] Roy biomeister there here's something he says he's never seen and it actually can be the case but as a science your goal isn't just to you know take something that somebody said and show that it was wrong it is to construct theories that can occasionally survive
[01:53:25] this kind of falsification and because otherwise it sounds like how you conceive of it Dave is that it's like the goal is to get to where Socrates is where he's wise because he doesn't think he knows what he doesn't know I think that you
[01:53:45] are not getting what I think of as falsification it's not that I am just seeking to like I'm building theories I'm building positive claims but the way that you go about it is through testing and seeing if it fails right so so you are right you're accumulating positive
[01:54:03] knowledge it's not that I I'm just trying to discard every thought that I have about this phenomenon it's that I'm positively building a case a theoretical case and I think I'm just a little more patient about like the the vague
[01:54:19] initial theoretical tests that people use where it's like yeah it's not yet a full fledged theory let's take the claim whoever made it that guilt doesn't lead to self-punishment directly and then I do the experiment and I and I show that no in this case it does
[01:54:35] then I think that there's positive knowledge to add to the theory of guilt that you were that you were bringing to the table so now you can say oh it looks like this is a potential way in which
[01:54:46] guilt can work and which guilt can be reduced and so let's do some other tests to see maybe what like are there boundary conditions for that is it only a specific set of people
[01:54:56] is it only mild pain like does it work with severe pain and then you just start building but you do it by trying to knock down right in the local sense trying to hypothesis test by falsification
[01:55:09] yo thank you hey thanks for having me on this has been truly epic just in terms of length in terms of length in terms of you guys argued a little which you hardly ever do anymore so that's exciting I know it's good it
[01:55:21] feels good now when we do it feels like a feels like a release much like the masturbation the the punishing masturbation right uh yo we didn't have you on for a movie but we want you to know that
[01:55:35] you are now officially a friend of the podcast again we've we've looked past the fact that you have a competing podcast with growing numbers and uh we were bigger than that we're bigger men
[01:55:47] than that that's that's very nice of you and it really it makes me feel very happy to hear that if it helps the numbers aren't growing very quickly so you know you don't need to fear us for a
[01:55:58] while it's okay ours yeah I think that you're taking some from us listen to their podcast I like it and we should have you on to do a bonus episode like another rick and morty or something I would love that um
[01:56:12] I be did you guys ever talk about uh you did talk about dark didn't you oh yeah we did we did oh yeah yo also and who put me onto it so so we should actually for next yeah for next season
[01:56:23] next season yeah that's a great idea you should cut yeah that that that will be uh epic as well they're gonna dump them right they're gonna give us all of them at once yeah I believe so yeah I'm so excited
[01:56:38] fuck science let's go with your question can we ever talk about another study again yeah yeah this was this is what I didn't I didn't do proper job of of introducing but my fear was that all of these criticisms
[01:56:53] of science are gonna lead to uh Tamler and I not being able to have anything to talk about because already he's negative about philosophy if you're negative about psychology then like what the hell is our podcast you can just keep talking about short stories yeah David Foster Wallace
[01:57:07] we'll have to change the description on it too yeah a philosopher and psychologist don't talk about philosophy where it's like yeah all right thank you yo I'll listen to two psychologists for beers um join us next time on very bad thanks for having me
[01:57:58] anybody can have a brain very good man just a very bad wizard
