Columbia Is Still Discriminating
Columbia's admissions department has been hacked, and we now know they're still practicing affirmative action
The same hacker responsible for leaking all of the data from New York University’s (NYU’s) admissions department in late March of this year has done it again. This time, they’ve leaked all the data for Columbia. This article will provide a broad overview of what that data shows, and what this means about Columbia’s compliance with the Supreme Court’s ruling in SFFA v. Harvard. Here’s a summary in four words: they did not comply.
Firstly, Columbia remains test-optional.1 Students are not required to submit test scores, so, as at other schools, less-qualified admits tend to not provide scores. Those who do not provide their scores tend to perform like students who have fairly low scores, earning lower GPAs, flunking out at higher rates, switching to easier majors more readily, and so on.
Who provides scores more and less often at Columbia? Submissions are shockingly rare, but ordered how you might expect if you’re familiar with educational achievement gaps. Asians submitted test scores more often than Whites, who submitted them more often than Hispanics, who submitted them much more often than Blacks.
This needs to be kept in mind to understand subsequent test score comparisons. The fact that certain groups submit test scores less often makes it so their scores are inflated relative to those who submit their scores more often. The fact that so few submit scores makes it difficult to create an academics-based admissions model, making it harder to determine how Columbia might have biased its admissions process. With that in mind, let’s see SAT scores for admits and students who Columbia didn’t accept:
These results can be contrasted with the results from NYU in an interesting way. At NYU, non-admitted White and Asian students performed marginally and substantially better than Hispanic and Black admitted students, respectively. This could be due to differences in discrimination by the admissions departments, or to other factors like NYU not being the first choice for high-performing Hispanic and Black students as often, and high-performing Whites and Asians having fewer options. Since Columbia is an Ivy League university with a great deal more cachet than NYU, it’s likely that it’s the first choice for more people, so they’re more likely to attract elite students and to leave behind narrower test scores gaps post-selection, and that effect couples with test-optionality biasing Black and Hispanic scores upward.
With these factors in mind, these results—though less dramatic at a glance than NYU’s—are just as indicting. If we throw in ACT results, the fact that Columbia must be racially discriminating becomes even clearer: rejected Asian students substantially outscore admitted Blacks.
With more common criteria, the picture becomes much more like NYU’s, with Columbia clearly rejecting more qualified Whites and Asians than the Hispanics and Blacks they choose to admit. That means White and Asian students tend to have better GPAs, more extracurriculars, and—quite likely—better personalities and higher levels of other achievements, as we saw in the SFFA v. Harvard case. Virtually every qualification favors White and Asian students, and yet they’re still admitted less, at a level not explicable by things like legacy status, athletics and faculty relationships, and so on.
In other words, Columbia is clearly discriminating. When you try to predict admissions from test scores or GPAs, or a combination thereof, you consistently see that Asian students are underadmitted relative to those academic qualifications. This race interaction effect is large and significant regardless of how you permute the models, and it always indicates that, at the same level of legitimate qualifications, Blacks are more likely to be admitted than Hispanics, who are more likely to be admitted than Whites, who are more likely to be admitted than Asians.
If race is not considered in admissions and there aren’t major compensatory factors afoot—which has never been the case on investigation—then the model coefficients should not be this way. We know from this data that the gaps shown above should be smaller if offers were made to everyone in order of their qualifications without consideration of race. Race coefficients should all be nonsignificant and small; admissions should be like going down a list in order of academic qualifications and ignoring race, leaving behind no race effect. And yet, we got a race effect.
Columbia is still discriminating, and their attempt to suggest otherwise by discriminating somewhat less is just an attempt to cover their behinds.
Force Self-Incrimination
University after university has only feigned compliance with the SFFA v. Harvard ruling. At best, their demographics change a little bit in the expected direction for people who really are complying, and they seem to expect that to sate the public. But it should not, because they’re just reducing the magnitude of discrimination, while continuing to engage in it, and that’s not good enough. SFFA v. Harvard said to stop discriminating, full-stop.
So, how do we stop universities from continuing to violate the law like NYU and Columbia have been doing? Through gathering and releasing more data. The simplest solution to this whole issue is IPEDS Reform.
IPEDS is the Integrated Postsecondary Education Data System, the primary postsecondary education data collection program in the U.S. It’s maintained by the National Center for Education Statistics, and it contains a wealth of data, but it could contain a lot more. Right now, anyone can go access IPEDS and obtain information on student test scores and demographics for different universities alongside a bundle of other interesting pieces of information. But it’s not enough. As noted by the primary researcher in the SFFA case, Peter Arcidiacono:
The Integrated Postsecondary Education Data System (IPEDS) provides information on total applications, admissions, and enrollment for all institutions that participate in federal student aid programs, but does not disaggregate the data by race or SAT score. A simple change in the data collection practices would generate a clearer picture of the role race plays in the admissions process at American colleges and universities, as well as help high school students make more informed decisions about where to apply to college.
The President can force IPEDS to enforce data collection standards that will make it unambiguous whether universities like Harvard, NYU, and Columbia are acting in violation of the law. They can go much deeper than merely providing race and sex breakouts, too. Universities should be forced to provide extensive, ideally individual-level data, both for applicants and for attendees, so that a comprehensive picture of the U.S. education system can be pieced together each and every year, and institutions can be appropriately audited with data in hand.
When IPEDS is reformed into a system that forces universities to provide the data that we know they collect internally—from leaks like this, and because we can see it on their applications and in publications—, it will allow independent researchers, auditors, and other officials and interested parties to credibly speak out against violations of the law. Without this data being available, that won't be possible. We'll have to, instead, deal with decade-long court cases again, because the proof of discrimination is just that hard to force them to provide. If every university provides the data, each case is easier. Forcing them to provide admissions formulae would make this even easier.
Consider the Ivies. If we admitted just the most qualified students, it is impossible for them to admit as many Black students as they have. If we had data on every Black applicant, we could be completely sure of this and the DOJ could easily beat them in court. Consider the role of GPAs in admissions too. Having extensive data available allows us to better understand the meaning of GPAs at different high schools, so that admissions departments do not inappropriately inflate the résumés of students who graduated from somewhere easy.
To wrap up, reform the data collection mechanisms and your case makes itself. Once it's possible to know discrimination is taking place, it will be known, because parents and interest groups will go and use the data. The universities will become easily punishable and no doubt, in short order, punished. Because the NIH, NSF, DOE, and other agencies are being turned over to the same regime that can compel the NCES to collect more data through IPEDS, the punishments can be extreme. Discriminate? Maybe you lose funding for all your researchers or all subsidized student loans.
This move would pave the way for America's institutions of higher learning to stop promoting the unqualified into America's elite journalistic, scientific, and governmental institutions through racially discriminatory admissions, and to instead promote excellence once again.
I’ll leave it to the hacker to decide how they want to release the data to the public and will update here accordingly.
I added this footnote a few hours after I uploaded the article. It clarifies something explicitly that I thought was clear, but people evidently are not getting:
Test-optional policies allow universities to mask discrimination.
I’ve noted here that we can get around this if we have enough data, but usually, we don’t, so we get stuck. Masking discrimination is the real point of test-optional policies, far beyond all the objectively false bluster about fairness. If you’re unfamiliar with how this works, go read here, and also here, while you’re at it. No one who deals with this topic is unaware of this problem, and that includes members of the current Trump administration. In the Department of Education’s February 14 Dear Colleague letter, they wrote:
Relying on non-racial information as a proxy for race, and making decisions based on that information, violates the law. That is true whether the proxies are used to grant preferences on an individual basis or a systematic one. It would, for instance, be unlawful for an educational institution to eliminate standardized testing to achieve a desired racial balance or to increase racial diversity. [Emphasis mine]
To be more explicit than that letter, the administration is saying that they’re aware of this strategy. They are aware that test-optionality is abusable, as it makes it harder to check whether other factors are being considered as proxies for race or some other desired demographic characteristic. The lack of an objective, unbiased baseline—test scores—makes it harder to grade the trustworthiness of everything else.
Consider the situation with high school quality and censoring class ranks. In that case, higher-ranking high schools censor class ranks, as they aim to provide their students with inflated GPAs that cannot be understood in terms of position in the class. If a person with a 1300 SAT score has a 4.0 and they’re at the 50th percentile for GPA, that GPA obviously means something different from the one held by a person with a 1300 SAT score and a 3.89 GPA, but who’s also the valedictorian of their school.
In short, what I am saying is that test-optionality is a tool for obfuscating discrimination because it removes objective anchors for student quality evaluation. Without said anchors, those interested in detecting and eliminating discrimination are adrift. When the statistics available to researchers concern post-admission data, the issue is compounded by the added range restriction (which is worse at more elite institutions). Together, these factors can make it seem like nothing is amiss. That’s why we need all the data to be made available, to simulate counterfactual scenarios. Thankfully, with this leak, we can see that plausible counterfactual admissions offer scenarios are inconsistent with what’s observed in practice.
To show just one example, I’ll provide the demographics that would be achieved by just going down the list in terms of SAT and re-scaled ACT scores, noting again that this sample only includes people who submitted scores and some people submitted both types. Going test-only would lead to:
Whites: ~36% → just above 24%
Asians: ~48% → ~72%
Hispanics: ~9% → <3%
Blacks: ~8% → <2%
In the real world, high-performing Black students will tend to have a much greater number of offers to attend elsewhere, making them less likely to go to Columbia relative to somewhere more prestigious or remunerative. The opposite will be true for White and then, even more extremely, for Asian students, as they are subject to discrimination and they are less likely to qualify for free applications. Therefore, if we were to account for realistic alternative options based on this stylized fact, it’s likely that the numbers would be even more slanted, with resultingly smaller Black and Hispanic classes. With various norming strategies applied to different criteria like GPAs, we get a similar picture: the results are almost-certainly not consistent with meritocratic admissions, and they fit better with a scenario where there’s substantial discrimination. If you throw in all plausible ‘merit-based’ criteria, the picture would end up being a more muted version of the test score-only scenario, as was shown to be the case for Harvard during SFFA v. Harvard.
This is as inescapable a conclusion as it was during SFFA v. Harvard, and I am hopeful that the Department of Justice will accept my standing offer to provide them with all of Columbia’s admissions data so that they can make this reality widely known in the course of formal legal actions against the university.
Imagine my shock. Well played to the hacker. Hope these universities pay dearly for their crimes.
>Virtually every qualification favors White and Hispanic students
That should read "White and Asian students."