Missing Fixed Effects Don't Justify Segregation
Ketanji Brown Jackson was too hasty when she extolled the benefits of medical segregation
Beyond campus, the diversity that UNC pursues for the betterment of its students and society is not a trendy slogan. It saves lives. For marginalized communities in North Carolina, it is critically important that UNC and other area institutions produce highly educated professionals of color. Research shows that Black physicians are more likely to accurately assess Black patients’ pain tolerance and treat them accordingly. For high-risk Black newborns, having a Black physician more than doubles the likelihood that the baby will live, and not die.
Those words are from Ketanji Brown Jackson’s dissent in the case of Students for Fair Admissions v. Harvard.1 The emphasis is mine.
Justice Jackson has made several statements like those, suggesting, in effect, that we need to racially segregate America. Blacks need Black doctors, Blacks need separate services provided by Blacks, and without racial discrimination in the college and university admissions process, there won’t be enough Blacks in professional roles to satisfy the need to segregate.
But—as one might infer from Justice Jackson’s writings generally—her supposition in favor of segregation was poorly reasoned.
Justice Jackson’s belief that the survival of high-risk Black newborns is doubled with a Black physician is based on a misreading of a 2020 study from the Proceedings of the National Academy of Sciences.
The study used data on Floridian births between 1992 and 2015 and found that 99.1% of Black newborns with a White physician survived, compared to 99.6% of those with a Black physician.2 The survival rates were >99% for both groups, so it isn’t possible for Black physicians to have doubled survival rates in general, let alone for the “high-risk” Black newborns whom the paper never mentioned.
The scale of the effect of Black doctors on the survival of Black newborns was minuscule, amounting to under a dozen lives saved each year. Regardless, if the effect were real, it would have been correct for Justice Jackson to have said “having a Black physician halves the odds that a Black newborn dies.”
Even though the paper did appear to support that conclusion, it wasn’t strongly justified.
One of the issues in the paper was the choice of estimator. The paper’s headline results were based on ordinary least squares (OLS) regressions, which will work fine with the sorts of binary death data the authors used, but we don’t want ‘fine’ when the topic is this serious and relevant for policy. A logistic model is in general superior for this sort of analysis of rare, binary events3, and the paper’s authors even seemed to recognize that. Presumably because of model fitting difficulties, all the controls the authors were able to use with their OLS models weren’t available when they went to fit their logistic model, so the results of fitting it presented in Table S9 are somewhat more limited than we might like. Nevertheless, that table showed that with a more appropriate model, the effect of physician-patient racial concordance for Black patients was still significant and sizable (logit = -0.440), but it also showed that Black physicians came with increased risk of child mortality (0.157). If we interpret one of these coefficients, why not both?
Another issue with the study is p-hacking. Some of the paper’s results are not significant, but of their significant results for newborn survival with all their fixed effects taken out, all of them are between 0.10 and 0.01. I’ve elaborated elsewhere how this pattern indicates p-hacking.
The odds of this happening by chance with a real effect4 are extraordinarily low, so the paper is immediately untrustworthy. But if we believe it, we’ll have to also deal with the fact that the effect is not robust across specifications where we have no a priori reason to think it wouldn’t hold up.
The estimates are all of similar magnitudes so they’re not inconsistent with one another per se, but they are also all marginal, so they are hardly consistent with any effect at all. Even if these were robust, however, we have no reason to think any of them is causal because the authors only had a handful of fixed effects, and among that handful, they were missing the most critical one for causal inference: patient fixed effects.
The authors controlled for patients’ insurance, comorbidities, time of birth, hospital, hospital-year, and physician fixed effects, but even with all of those things held constant, there are many theories that could explain the result that patient-physician racial concordance is related to lower newborn mortality rates. For example, wealthier, healthier Black mothers might pick a doctor of their own race.
To interpret the study’s estimate causally, we need to observe the same patients giving multiple births,5 with some of those patients giving birth with the help of doctors of different races. If the effect holds up after using this fixed effect, it should be causal. The alternative is to watch the same person being born multiple times to doctors of different races, but that design suffers from logistical difficulties.6
Missing Fixed Effects As A General Problem
Can you tell?
This post I made back in September shows a plot of data from Almond, Chay and Greenstone’s paper on hospital desegregation in Mississippi. It clearly suggests that Mississippi desegregated in 1965 and that immediately had major, beneficial effects on the survival of Black newborns. The effect was so large that the study’s authors argued that Title VI led to no fewer than 25,000 Black infants being saved between 1965 and 2002.
Their explanation was plausible and the effect was large. The belief in this effect has been popular among economists because it’s believable that desegregation might save lives through channels like giving Blacks access to hitherto inaccessible medical treatment, causing Black and White infant mortality rates to converge like they did. Almond, Chay & Greenstone’s effect was so large that it could explain all of the Black-White infant mortality rate convergence between 1965 and 1971. As the chart shows, that was a period of particularly marked convergence.
But what if the apparent effect of desegregation was an artefact of excluding the right fixed effects?
Almond, Chay and Greenstone computed event-study estimates of the effect of a county gaining a Medicare-eligible hospital. For a hospital to be Medicare-eligible, it had to be desegregated, making this is an intuitively appealing instrument.7 The effect of desegregation measured with this instrument was apparent with a number of county-level controls, county fixed effects, and county-specific linear trends, but because all but five counties in Mississippi had a Medicare-eligible hospital by 1969,8 there was too little variation in Medicare certification dates to include year fixed effects. Accordingly, the effect of desegregation couldn’t be differentiated from any other changes that happened after the passage of the Civil Rights Act.
The decline in Black fertility, Black economic and educational progress, the rollout of community health centers, Medicaid—without year fixed effects, in Almond, Chay and Greenstone’s paper, that’s all conflated with desegregation!
Years later, Anderson, Charles and Rees sought to replicate Almond, Chay and Greenstone’s results, but they used data from five states in the Deep South instead of just Mississippi. Their expanded dataset had more variance in Medicare certification dates, so they were able to estimate year fixed effects.
So let’s compare the uncontrolled event-study estimates of the effect of desegregation on Black postneonatal mortality with Almond, Chay and Greenstone’s maximally-specified event-study estimates and Anderson, Charles and Rees’ version with added year fixed effects:
This is a stunning lack of effect. Apparently the reason for the association of desegregation with improved Black postneonatal mortality was because of improvements that accompanied desegregation, but not desegregation itself.
Since the shocking estimate I posted on Twitter was for diarrhea and pneumonia, we’re lucky that Anderson, Charles and Rees also provided estimates for pneumonia, influenza, and diarrhea induced deaths specifically:
It’s marginal, but apparently, if anything, hospital desegregation might have increased Black postneonatal mortality from pneumonia, influenza, and diarrhea.
How Often Is This a Problem?
In the two examples I’ve cited so far, failure to include the right fixed effects either precluded causal inference or completely nullified—and potentially reversed—results.
There’s no telling how often this happens. When it came to physician-patient racial concordance effects on infant survival, the problem was obvious; for hospital desegregation, the nature of the problem obviously wasn’t as clear. Neither of these is a one-off.
Some people hold the reasonable view that when teachers partake in professional development programs, they become better teachers, resulting in benefits to their students. In line with this view, cursory investigation suggests that is the case. But when student fixed effects are thrown in, that conclusion suffers an about face and the effect is either nullified or teacher professional development becomes harmful to students’ achievement test scores.9
In a paper published in mid-2023, McNeil, Luca and Lee used the British Household Panel Survey to provide convincing evidence that being born into a location with high unemployment had major, long-term impacts on individuals, making them more likely to support government intervention in the labor market, “less progressive on gender issues, and less likely to support the Conservative Party.” However, this study lacks the required design-based controls to actually render the conclusion its author wanted to in a credible way. The issue is that there’s residual confounding because we cannot observe the people born into places with high unemployment prior to their birth, so sorting-induced confounding remains.10 If there were observations of siblings born in locations with different levels of unemployment, this issue could be resolved because siblings could be compared to see if the effect remained within sibling pairs.
There are innumerably many instances where we can go so far and eliminate so many potential explanations from consideration in a study while the possibility that a crucial fixed effect or other residual issue remains. Cutting out these alternative explanations for findings is hard, and the data to address these issues often isn’t available. McNeil, Luca and Lee probably won’t find a dataset with everything they need for a long time; Almond, Chay and Greenstone didn’t have data beyond Mississippi and it took another group fourteen years to address the issue with their study. How many people were misled over that decade and a half? How many people are still being misled by it?
This issue is important because it impacts policy and research priorities. It’s an issue that will also always be with us in spite of our best efforts. All we can do to combat it in the long run is to cultivate a culture of radical data openness. Many people don’t want that, however, as they fear data will be misused. For better or worse, they tend to be people with views like Justice Jackson’s. Let’s hope their numbers diminish and more reasonable people win the day, or we might end up with more Supreme Court Justices arguing in favor of segregation.
Note that the effect on mortality risk was specific to pediatricians, not obstetricians.
The paper also suggests that physicians may have trouble learning whatever behaviors give rise to the patient-physician concordance effect because experience with Black newborns is seemingly irrelevant (Table S12).
The comparability of linear model estimates is a point in their favor that has no bearing on anything written here except for the comparison of their magnitudes across specifications.
The odds are less extraordinary in this specific case because the estimates are all based on estimating the same effect in different scenarios. However, the effect is still dubious because it falls into this range.
Other patient characteristics like age at birth will also need to be controlled.
A randomized controlled trial could work, but as I noted here, such a design might be hard to implement in America and even if it’s done, it’s likely to have major generalizability concerns.
Unfortunately, investigating the effects of physician-patient racial concordance in another country doesn’t help, since we are interested in an effect that people will readily argue is culture-specific: specific to African descendants of slaves, specific to doctors who benefitted from affirmative action, etc.
Almond, Chay and Greenstone’s estimate of the effect of desegregation is backed up by more recent findings like that the advent of Medicare-eligible hospital access was associated with greatly reduced non-White elderly pneumonia mortality.
Or their residents had the option of receiving care at a certified hospital in a bordering county.
This is charted here. It is also noted there that this study shows simply using student-level controls is insufficient, as the result was not the inversion observed when student fixed effects were used.
In other locations on Twitter, I have posted about the likely insubstantial and absent effects of teacher-student racial concordance.