Fraud, Incompetence, or Both?
A Retrospective on Turkheimer and Giangrande (2021)
This article originally appeared on Medium. It describes an example of a pattern of analysis that has been dubbed “pseudo-analysis.” Pseudo-analysis may be more prevalent today than ever before; it’s important that honest people understand and condemn it wherever it appears.
You’re a researcher and you’ve recently been tasked with assessing the efficacy of AstraZeneca’s latest vaccine for COVID-19. You have the data from a randomized, double-blind, placebo-controlled trial your research group ran over the last year and you want to present it. You’ve got five hundred people in your sample, four hundred of them were vaccinated, and 28.5% of the people who got infected were vaccinated so you’ve got a vaccine that is 90% effective at preventing infection. What’s more, there were proportionally as many side effects of the placebo as the actual vaccine and the people who got sick in the vaccinated group were less likely to be hospitalized and recovered quicker. Hurray!
The only problem now is that you hate vaccines. You think the vaccines weren’t tested for long enough, they’re dangerous, and they might even cause immense harm to the public. So, data in hand, you scrub your summary statistics from the paper and you decide to describe your results. You write the following blurb:
AstraZeneca’s latest vaccine for SARS-CoV-2 has no demonstrated effectiveness for preventing infection or reducing symptom severity. The results of our trial confirm there were numerous side effects among vaccine recipients. At present, vaccination cannot be recommended.
You have just committed fraud.
What I just described is an egregious example of scientific miscommunication, the act of lying or distorting the presentation of scientific results. In some ways, it exemplifies how a sufficiently badly motivated scientist might lie without explicitly lying. In that example, the researcher checked both the effectiveness of the vaccine for preventing infection and symptom severity and even looked at whether it generated abnormally many side effects, but they carefully lied about each of those things. It is not technically a lie to say the vaccine did not demonstrate effectiveness on some arbitrarily selected margin and it is not technically a lie to say there were side effects among the vaccine recipients, but both are lies in the standard way the term is used.
Now imagine a twist. The researcher still hates vaccines and he makes the same conclusions, but this time, he doesn’t even look at this data. He has just committed fraud. If we’re lucky Dr. Researcher will provide his data or at least some of the required summary statistics to show he was wrong, but we can only hope. If he’s competent, he’ll omit the data used to make his conclusions and we may never know the value of the study. Regardless of how he commits this fraud, we can rest assured that it will poison the public discourse surrounding vaccines and we may all be worse off for it.
These examples are not qualitatively unusual. Researchers frequently have a bone to pick with certain conclusions and even with other researchers for every reason under the sun. They might disagree with a study’s conclusions, they might find something in or about a study immoral, they might think the wrong methods were used, they could just be a contrarian, or they may even hate a particular author or research group. Sometimes, they’ll elect to write a reply, comment, or critique about another researcher’s study.
You’re a researcher and you just read the results of the latest randomized, double-blind, placebo-controlled trial for AstraZeneca’s COVID-19 vaccine. You see that the vaccine was 90% effective at preventing infection, it prevented hospitalizations, vaccinated people recovered faster from infection, and they weren’t any more likely than the unvaccinated to suffer side effects and the researchers even released their data! You’re mad and these results just won’t sit with you, so you set to drafting a reply.
You say that the trial was confounded by randomization failures and that the placebo group ended up with a higher propensity for suffering serious COVID symptoms. You say that the effectiveness was overstated because of attrition that systematically increased the gap between the placebo and intervention groups. You even say that the original paper presented the wrong results and instead of $variable_X, they should have looked at the $variable_Y in their dataset!
As a respectable researcher, everyone sees your criticisms and they’re taken seriously. You are, after all, a scientist, and the people you’re criticizing are damnable fools who should have known better. The only problem here is that you forgot to reanalyze their data they shared to show your conclusions were correct. You could have used inverse probability weighting or done covariate matching, used the new outcome you were interested in, or used inverse probability of attrition or non-response weights. Altogether, you could have tested everything you said, but you didn’t. Everyone took you seriously and you didn’t put in the work so you don’t have the receipts.
You have just committed fraud.
Or have you? Another possibility is that you are what is formally known as “an idiot.” You are one of the millions of people who probably should have imposter syndrome. The reality is that you didn’t check your claims when you could have because you didn’t know how, you didn’t know you could, and/or you really just thought it was enough to fingerwag about potential problems that might not have been actual ones. And why not? You might think you’ve seen other people do the same thing, and you’ve got your reasons.
You either committed fraud or exposed your incompetence. Within the realm of reasonably interpreting those words, you’re left with no other options. Are you a fraud or are you incompetent? And more importantly, what should we do about it?
Enter Giangrande and Turkheimer (2021). This University of Virginia-based dynamic duo sought to debunk a study that evidently made them very upset. This study was Pesta et al. (2020), a meta-analysis indicating that in the United States, every Census racial/ethnic category has a similar heritability for IQ. Giangrande and Turkheimer took umbrage at this suggestion and its implications.
Giangrande and Turkheimer wrote that the Pesta et al. meta-analysis was flawed beyond recognition, that the conclusions were unjustified, the study was underpowered to detect differences between the groups it examined, that the implications were irrelevant for Turkheimer’s pet Scarr-Rowe hypothesis, and that the authors were irresponsible and deserved opprobrium. Quite the claims, but we need to get more specific to understand them. I will ignore their ethico-moral claims, since they are matters of opinion, not fact.
On the conceptual front, Giangrande and Turkheimer argued that Pesta et al. inappropriately extrapolated that their meta-analysis had implications for the Scarr-Rowe hypothesis, the idea that genetic effects on or the heritability of intelligence—and several other important outcomes—are moderated by environmental quality. The conclusion from Pesta et al. makes sense: different groups might have different levels of environmental quality. Some might experience pervasive discrimination, others might be richer or poorer. Socioeconomic status is almost always the proposed moderator in assessing the Scarr-Rowe hypothesis, so using race as a moderator is obviously sensible because there are racial differences in wealth, education, investing behavior, home environment, and every other aspect of socioeconomic status. According to Giangrande and Turkheimer though, racial similarity in heritability says nothing about the Scarr-Rowe hypothesis and the Scarr-Rowe hypothesis was never about race in any way whatsoever. Lets call this their Claim #1.
Their Claim #2 is the multifaceted claim that Pesta et al. conflated race and ethnicity and that the racial categories were nonsense classifications that affected the results of the study. They even took issue with Pesta et al.’s choice to present all of the data available to them because that meant providing data on some groups original studies described poorly.
Claim #3 was that Pesta et al. analyzed too many unpublished studies, reanalyzed original data, and that some of the studies in the meta-analysis came from journals that weren’t very prestigious. A related claim that I won’t take up as one of their main points was that the inclusion of more data reduced the power of the meta-analysis.
Claim #4 was that Pesta et al. included different types of tests and they didn’t assess how this might have moderated their results. For example, it might be the case that so-called achievement tests show bigger differences between groups than standard IQ tests. Giangrande and Turkheimer even went so far here as to claim that one of the measures Pesta et al. used was not a test result, but instead, parental education level!
Claim #5 is a general methodological one. At this point in their critique, Giangrande and Turkheimer were bringing up issue after issue, hoping something would stick. They claimed constraining model estimates biased the estimates used in the meta-analysis, that mean and SD estimation procedures were dubious, weighting decisions were wrongful, and more. In the same breaths, they proceeded to claim that high levels of sampling error observed in the study meant the conclusions about mean differences were unjustified.
Claim #6 was that a side analysis in which Pesta et al. calculated the group differences in IQ when both groups had the same heritability was simultaneously interpreted incorrectly and “by far the most robust [result] reported in the article”.
Claim #7 was that there was not enough power to detect group differences in heritability in the meta-analysis.
Claim #8 was that Pesta et al. ignored confounding variables when convenient, and invoked them as needed in a contradictory manner. Apparently, if invoked consistently, the meta-analysis’ results would not be supported.
Those were the major claims, and that list of eight isn’t exhaustive of every little quibble Giangrande and Turkheimer made. At a first glance, the list seems damning. How could Pesta et al.’s meta-analysis be so riddled with mistakes that are in some cases elementary? The analysis that showed these problems must have been something. Given how extreme these claims were, and how true they must have been coming from someone with the reputation of Eric Turkheimer, Pesta et al. made a dire mistake making all of their data publicly accessible.
But there’s the rub: Giangrande and Turkheimer did not analyze the data from the meta-analysis. They made many claims about that data and the results of the meta-analysis, but ultimately, they did not produce any reasons to believe their claims beyond the mere allegation that something might have been amiss. Nonetheless, their critique is very popular. According to Altmetric, their study has been picked up by a news outlet, blogged about, been linked in numerous tweets, and it was referenced on a Wikipedia page.
As noted above, if you make claims you don’t back up when you can back them up, you’re committing fraud. Pesta et al.’s data is available for download and the original datasets are available for several of their datapoints. There was nothing holding Giangrande and Turkheimer back from doing their own reanalysis to expose the flaws they alleged existed. Luckily, Pesta et al. did.
I read Giangrande and Turkheimer’s critique, and then I read the response. Pesta et al. (2022)1 is a methodical almost line-by-line revelation about the quality of the moral character of Giangrande and Turkheimer. Unlike the study it replies to, it doesn’t come out and say anything to the effect that the authors who wrote it and the editors who accepted it are immoral people. Instead, it takes the high ground: it shows that they lied, they manipulated the truth, and they did so brazenly, and it lets the reader fill in the gaps. What follows is a short claim-by-claim overview of where Giangrande and Turkheimer went wrong using the information in the Pesta et al. response.
Claim #1 was that the Scarr-Rowe hypothesis does not suggest any predictions about race and it has never been about race.
A one-sentence response to this claim can be made by simply looking at the name of the title of the 1971 study the Scarr-Rowe hypothesis comes from. That paper’s name is—and no, I am not kidding—Race, Social Class, and IQ: Population differences in heritability of IQ scores were found for racial and social class groups. The paper that established the Scarr-Rowe hypothesis is about the idea that differences in environmental quality moderate the heritability of IQ, and because African Americans and European Americans have different average levels of environmental quality, heritability ought to be different for those groups. The study suggested that because environmental quality is usually worse for African Americans in terms of socioeconomic status, discrimination, test-relevant culture, and so on, their IQ heritability should be lower. Per the paper, if racial differences in IQ were completely environmental in origin, the heritability should have been greatly reduced in the Black group.
That is the paper that formed the basis for the Scarr-Rowe hypothesis. It exists in clear contrast with Claim #1: from its very beginning, the Scarr-Rowe hypothesis was explicitly about race. But Pesta et al.’s reply went even further and listed multiple examples where race was used as a moderator of heritability and genetic effects for the same reasons Scarr laid out in 1971. Readers should go and look. I’ll throw another in that it looks like Pesta et al. missed. This citation comes from Turkheimer and Horn (2013), where Turkheimer cited it. In Yeo et al. (2011)—and I will quote Turkheimer for this description—they “reported correlations between scores on the Wechsler Abbreviated Scale of Intelligence and the total length of rare copy number variations in a sample of 74 individuals with alcohol dependence. The total length of the rare deletions was correlated with intelligence at r = − .30, and the correlation was higher in the Anglo/White group than in the non-White group. [Emphasis mine]”
As far as I can tell, Claim #1 is a lie, and it is not just a white lie, it is one that is so brazen that you would even discover it if you simple searched for the word “race” in Giangrande and Turkheimer, since you would eventually find a citation that leads to the paper where “The finding [of the Scarr-Rowe effect] was first reported by Scarr-Salapatek”, Race, Social Class and IQ. At this point, the only thing one could do is affirm what Turkheimer wrote in 2009:
Finally, there is also a need to return to the clear theoretical focus that Scarr brought to her early work on this subject in 1971 [in the article Race, Social Class, and IQ]. Now that software is readily available, it would be possible to re-analyze practically every twin analysis that has ever been conducted, with the familiar variance components moderated by socioeconomic status, or by age or gender or race. One would not want the field to wind up in the atheoretical tabulation of moderated variance components, without explicit reference to the developmental processes that underlie them.
Scarr was very clear: the Scarr-Rowe hypothesis is an environmental explanation for racial differences in IQ. That noted, we should all agree with Turkheimer that we should return to her clear theoretical focus.
Claim #2 comes from a misunderstanding on Giangrande and Turkheimer’s part. They saw Pesta et al. use the term “Self-Identified Race/Ethnicity” or SIRE and assumed this meant they were conflating them rather than using what is actually a standard term that follows directly from the construction of Census categories in the United States. It’s widely used in genetic studies, as in this 2019 study, this 2015 study, or this 2005 study. This label is useful! It’s meaningful to most Americans, and that meaningfulness translates to a solid correspondence with their genetic ancestry. It’s how people identify, so we should probably honor it. Supporting its use, in each of those studies, the correspondence between SIRE and genetic ancestry was extreme for Whites and Blacks, and was still strong for Hispanics. The correspondence was always weaker for Hispanics for the obvious reason that they are a more mixed population. Pesta et al. dealt with this issue thoroughly, so I’ll go to the next part of it.
Giangrande and Turkheimer took issue with some of the samples’ SIRE classifications. Pesta et al. responded by redoing their meta-analysis with variously omitted and reclassified samples and the results didn’t change. This part of Claim #2 was easy to check and that much easier to dismiss.
The last part sounds funny. Giangrande and Turkheimer were upset that Pesta et al. presented results for groups that were part of groups that don’t make intuitive sense, like Other Race, or groups that are represented only in singular studies. Some part of this is understandable because conclusions should not be made with incoherent groupings or based on one-off studies. But hewing closer to reality, it’s hard to take this complaint seriously when Pesta et al. simply presented all the data they had and only made conclusions for the groups that were well represented in the dataset. Why shouldn’t they present all of their data? By doing that, they have lent future meta-analysts a bigger and more diverse dataset to work with.
Claim #2 does not stand, but it does provide evidence that Giangrande and Turkheimer made testable claims they did not test, that they took issue with fully reporting data, and that they wanted to find things to complain about so they complained about things that did not matter for the study they were criticizing. If they wanted to stick to what mattered, they obviously would not have talked about Pesta et al. including data they did not make inferences with.
Claim #3 is that doing something both expected of and standard for meta-analysts somehow harmed the conclusions of the meta-analysis. Giangrande and Turkheimer alleged that there were problems due to the inclusion of results from unpublished studies and original data Pesta et al. analyzed. Moreover, they suggested there was something wrong about the data taken from journals they regarded as less prestigious and wrongly suggested did not have adequate peer review.
Giangrande and Turkheimer mentioned these things as problems, but they never actually explained why they were issues.
As a general rule, you should seek out all data possible when conducting a meta-analysis. This is how to avoid the “file-drawer” effect, where interesting results that were not significant, went in the opposite direction of the general tenor of conclusions, or were just otherwise not publishable were omitted from the literature, biasing the results of a meta-analysis that is only conducted with the results that have been published. If anything, Pesta et al. should have been applauded for their efforts in finding unpublished manuscripts.
The datapoints based on original analysis don’t seem problematic. Why would they be? The only way they could be would be if they were faked. But we can access most of the original data and check, and they’re unsurprisingly correct. So what’s the deal? What probably happened here was that Giangrande and Turkheimer wanted to increase the number of criticisms so people would trust the paper they were criticizing less. That strategy usually works, but only on people who are credulous about your claims.
For similar reasons, the datapoints from the journals Giangrande and Turkheimer disliked don’t seem problematic either. They never said what was wrong with them, they only gesticulated in that direction. Either there is a problem or there isn’t. They never provided any reason to think publication in supposedly subpar journals would impact the quality of the estimates. Quality hasn’t saved supposedly higher-prestige journals from being some of the worst off in the replication crisis. Giangrande and Turkheimer just wanted to cast vague aspersions to sow doubt.
Claim #3 had nothing to it conceptually and there was no proof given for it, but Pesta et al. nevertheless affirmed Giangrande and Turkheimer by asking What if? They ran their meta-analysis again with and without these supposedly problematic studies and they assessed whether being labeled as problematic moderated the effects they were looking for. In supplementary analyses, they also looked at whether a datapoint being labeled as problematic in multiple ways would continuously moderate effect sizes. But despite the allegations, none of this mattered. The meta-analysis achieved the same results comparing problematic and nonproblematic studies, cutting out the problematic studies entirely, and when treating problematicness as a quantitative moderator.
Maybe even more importantly, Turkheimer has praised Tucker-Drob and Bates (2016) meta-analysis of the Scarr-Rowe effect. This study had many of the same details Giangrande and Turkheimer attacked in Pesta et al., but they never thought to write a critical response to it. The two meta-analyses used the same division of test types, the same traditional meta-analytic averaging methods, both affirmed the relevance of race to the Scarr-Rowe hypothesis, and Tucker-Drob and Bates’ study included more novel analyses and unpublished estimates. Tucker-Drob and Bates had 14 total samples and only one estimate was presented in the article they referenced. Fully half of their samples were analyzed by Tucker-Drob and Bates themselves. Pesta et al. had 16 total samples and only two of the them were analyzed by Pesta et al. themselves while six of their samples’ information was presented in their original articles. If Pesta et al. is problematic, Tucker-Drob and Bates is far worse. In all actuality, both studies are fine.
Claim #4 was alluded to in the discussion of Claim #3. Pesta et al. used the procedure for managing heterogeneous tests that was used by Tucker-Drob and Bates. There is no realistic way to address this concern because most studies use different tests and there aren’t enough studies to test whether using a particular test moderated results. Because an analysis cannot be done doesn’t really amount to a criticism. For this particular issue, there’s no reason to think the criticism is even dire. In general, it’s found that achievement and intelligence tests measure the same things and heritability for their latent factors is virtually identical. More importantly, however, this quibble cannot be meaningful because the meta-analysis used matched samples who took the same tests. If there were going to be relevant effects they would have to emerge because the particular types of tests used were biased by race, but bias in intelligence and achievement tests is almost unheard of, and at large magnitudes, even more so.
Giangrande and Turkheimer made some further errors related to Claim #4, like that Pesta et al. at one point used an estimate of the heritability of parental education instead of a test heritability, but that was an obvious error that anyone who opens the data can see. They also erred by saying the data from the Mollon et al. study included as a datapoint in Pesta et al.’s study was an average score, but it was a factor score. They would have seen this if they read Pesta et al., so it’s unclear why they made this mistake.
Claim #5 might have been a real issue. There can be problems with the estimates derived from structural equation models that can be alleviated by using the twin correlations. Turkheimer is the authority on this, even though in the study where he diagnosed the problem, he nonsensically called a pattern of sibship correlations a violation of twin model assumptions. Pesta et al. addressed this by using the twin correlations instead of structural equation model estimates. The meta-analytic results did not change. Giangrande and Turkheimer had the ability to check this claim but they did not.
Claim #6 was that an analysis Pesta et al. reported was misinterpreted. This analysis involved taking the heritability gaps between groups and regressing them against the mean differences between groups to figure out what D₀, or the difference in mean IQs when heritability is the same, really is. This analysis is immediately interesting and it makes intuitive sense. If we vary the differences in heritability because they are not random phenomena, the differences when heritability is the same are a simple estimate of what we would expect to be there if the group differences are not differentially environmental in origin. The same method can be used to figure out how true greater male variability in intelligence is without the confounding effects of differences in scoring. In both cases, the method is very useful, but Giangrande and Turkheimer disagreed.
They noted that the analysis showed mean differences were reduced to the degree the heritability gap was. For all pairs of samples, the correlation between the difference in heritability and mean differences was r = 0.59, for Whites and Blacks, it was 0.98, for Whites and Hispanics it was 0.76, and for Hispanics and Blacks, it was 0.64. When there was no heritability difference between these groups, the gaps were 0.64 d, 0.85 d, 0.66 d, and 0.31 d. At the subtest level within studies that had such data, the correlations were -0.26, 0.87, and -0.23, with 12, 5, and 15 subtests, respectively. The corresponding d’s were 0.64, 0.70, and 0.40. When heritabilities were equal and thus the variances of the environments that should affect them were too, groups still differed.
Giangrande and Turkheimer saw these results and claimed Pesta et al. ignored their implications. They never explained why. They said they were striking, suggesting they were the only strong findings in the entire article, and that they were “by far the most robust [results].” However, they somehow interpreted them as “unequivocal evidence for the hypothesis [Pesta et al.] presume[d] to be denying.” These statements don’t make sense for a number of reasons, not least of which is that they were estimates produced with the data from the meta-analysis they were sentences prior saying was so horribly low-quality that it could not be trusted. If the data going into this “robust” result cannot be trusted, why would we trust the result itself?
The reason why Giangrande and Turkheimer claimed this stunning finding was ignored was because they were irritated that Pesta et al. did not misinterpret it. It was their belief that the result actually suggested the between-group differences were 0 when environments and heritabilities were equal, thus revealing that they do not understand that whatever the slope, a regression equation still has an intercept. If the slopes had been perfect, there could have still been differences where heritability differences were mooted because the groups might differ for other reasons than differences in their heritability. Those differences would appear in the intercept. Instead of grappling with the reality that intercepts are a part of linear models and that they mean something, Giangrande and Turkheimer acted as if intercepts were not a real thing and cast aside the idea of them by invoking unexplained complexity.
Imagine someone made this claim for the much less controversial idea of greater male variability. We know men differ, and we know that the higher a group scores relative to another one, the higher its variance tends to be. But when mean differences are removed from the equation, men are still much more variable. Should we throw out this idea in favor of the notion that complexity actually means we can ignore findings we don’t feel like acknowledging? I don’t think so, but evidently Giangrande and Turkheimer do.
There are no underpowered studies. Studies are only ever underpowered with respect to specific effect sizes. The same study looking for an effect size of 0.1 may be underpowered, but if it is looking for an effect size of 0.3, it may have more than enough power. Claim #7 was a non-sequitur from the start. Giangrande and Turkheimer said Pesta et al. had low power but they never said what that meant because they never provided an estimate of the size of the effect Pesta et al. should have been looking for.
So Pesta et al. went digging. They used the estimates from Tucker-Drob and Bates’2 meta-analysis that Turkheimer previously praised, and in a supplementary analysis, they computed how large the gaps ought to be expected to be and then assessed how large their minimum detectable effects were and how much power they had for the effects they should expect given Tucker-Drob and Bates’ results. For Black-White comparisons, Pesta et al. found 2.5% higher heritability and could detect an effect of 6.5% lower heritability, while they had 99% power to detect an effect of 10% lower heritability. Because heritability is a variance, the power to detect differences is asymmetric with respect to sign. For the expected Scarr-Rowe effect size when contrasting Black and White Americans, they should find 3.9% lower heritability, but they found 2.5% higher heritability, which meant they had 83% power to detect a Scarr-Rowe effect deviation, and this was in the model with the most generous assumptions for Giangrande and Turkheimer. For Hispanic-White comparisons, they observed 10.6% higher heritability and could detect an effect of 13.7% lower heritability while they had 98% power to detect an effect of 20% lower heritability. Also in the most generous model for Giangrande and Turkheimer, Pesta et al. had 84% power to detect a difference from Scarr-Rowe hypothesis expectations for Hispanic-White comparisons.
Pesta et al. was actually a study with quite high power to detect reasonable differences in heritability. The claim that it was underpowered was not based on any sort of power analysis, it was a suggestion conjured from nothing to make the study seem weaker than it was.
Claim #8 gives way to some of the most striking proof that Giangrande and Turkheimer criticized Pesta et al. without thoroughly reading it or opening the data at all. Giangrande and Turkheimer claimed potential confounders to the analysis discussed under Claim #6 were ignored by Pesta et al. when discussing the results of the meta-analysis itself, highlighting an apparent contradiction and revealing Pesta et al.’s opportunism and bad faith. This complaint only makes sense if you gave Pesta et al.’s meta-analysis a superficial glance rather than the sort of scrupulous reading you would need to legitimately criticize the study.
The confounders in question were things like age and type of test, and for an analysis in which all the results are being compared, it makes sense to invoke them because they contribute to the now-relevant variance between studies. For the meta-analysis, paired samples were used, so different groups’ estimates were already matched on age, type of test, estimation method, etc. More importantly, this was obvious from reading the original meta-analysis! The confounders mattered differently because the analyses in question could only impact inferences in one and not in the other. These potential confounders were never ignored by Pesta et al., they were properly understood.
At the end of the day, we have to reconcile the lies, errors, omissions, misrepresentations of fact, and the presentation of testable but ultimately incorrect arguments by Giangrande and Turkheimer as evidence for one or both of the following: incompetence or fraud. The incompetence on display when making the numerous errors they did—and there were more covered in Pesta et al.’s response—might lend credence to the pair merely being imbeciles. However, I think their work speaks volumes otherwise. I will contend that they are clearly not stupid, but they are also clearly malicious. Because they wanted Pesta et al.’s conclusions to be false, they committed an act of fraud. An inkling of incompetence remained, however, in that their fraud was so facile.
Every procedural and analytic quibble levied against Pesta et al. by Giangrande and Turkheimer was tested and nothing stuck. This situation is remarkable for that reason. It is hard to imagine that a sufficiently motivated person could not find one issue to bring up that could substantially affect the results of a study, but here we have two of them and they failed to do just that. This is important because it substantially strengthens the conclusion that has to be made: Pesta et al. did a good job and Giangrande and Turkheimer wanted to do anything to besmirch it, so they committed fraud.
Tucker-Drob and Bates meta-analysis showed that Turkheimer’s original and most-famous 2003 study of the Scarr-Rowe effect in the National Collaborative Perinatal Project data was an example of the Winner’s Curse. If Pesta et al. had used that effect size, they would have had extraordinarily high power.