Automating Discrimination: AI Hiring Practices and Gender Inequality

“I think people underestimate the impact algorithms and recommendation engines have on jobs,” Derek Kan, Vice President of Product Management at Monster says.1 “The way you present yourself is most likely read by thousands of machines and servers first, before it even gets to a human eye.”2

Introduction

Amazon is a world leader in the use of artificial intelligence (AI) to address a range of business issues, from predicting consumer purchases3 to reducing its corporate carbon footprint.4 As the company grew, needing to hire tens of thousands of employees, management asked its engineers to create an AI algorithm to identify the best potential employees based on their resumes alone.5

After 500 attempts,6 the engineers collectively threw up their hands.7 Instead of creating a useful automated hiring technology, they had created the perfect tool to discriminate against women.8 The algorithm rejected applicants who used the term “women” anywhere—such as “Captain, Women’s Soccer Team” or “National Women’s Chess Champion.”9 It rejected applicants who went to all-women’s colleges.10 Not only did the program reject potentially qualified women before they even reached the interview stage, but some candidates the algorithm identified for jobs were not even qualified.11

How could the world leader in AI so miss the mark? The answer is an abiding fact of AI—it learns to replicate the biases of the data used to create it.12 Because the Amazon engineers developed the algorithm based on resumes submitted to Amazon, which were predominantly male,13 the AI responded by assuming male candidates were preferred.14 This fiasco led Amazon to give up on creating such a hiring tool.15 However, many other companies are marketing16 or employing17 AI-based hiring tools. In a Harris Poll conducted for CareerBuilder, 55% of Human Resource managers said they would be using AI by 2022.18 The COVID-19 pandemic escalated the demand for AI-based hiring technologies,19 further entrenching them into normal HR procedures.

Despite the potential for gender discrimination, independent developers and companies sell AI hiring tools to businesses without evidence that those technologies actually identify qualified candidates.20 At least 407 companies within the Fortune 500 use some combination of three such technologies—resume scanning, one-way video interviews, and the use of video games—to screen applicants.21

The developers marketing these technologies claim that the algorithms can decrease costs,22 save time,23 and identify the best applicants in the hiring process.24 The technologies are even touted as a way to avoid racial and gender discrimination25 and protect employers from being sued under employment discrimination laws26 because the decisions are made by a computer rather than a human.

Automation, however, is not necessarily a woman’s friend. On the internet, female job seekers are directed to lower-paying jobs more often than male job seekers. Researchers from Carnegie Mellon created hundreds of fake male and female internet job seekers.27 The fake job applicants from both groups visited employment webpages. The study found that male job seekers received overwhelmingly more ads for high-paying jobs than equally qualified female job seekers. Ads that read “$200k+ Jobs—Execs Only” and “Find Next $200k+ Job” were displayed almost six times more often for men than for women.28

The design of the technologies at issue in this Article similarly create a situation that favors male candidates. If the technologies are developed using data from the existing employees (such as their resumes, their speech patterns in one-way video interviews, or the way they play video games), the algorithm will privilege male traits if the existing employees are predominantly male. The risk of gender discrimination is real due to the male-skewed workforce in many major companies. In 2018, men accounted for 81% of Microsoft’s technical workforce, 79% of Google’s, 78% of Facebook’s, and 77% of Apple’s.29

This Article makes a unique contribution to the literature by combining a deep understanding of AI hiring technologies with an original series of proposals of how they should be addressed by law. The topic is of crucial importance due to the extensive use of these technologies and their powerful potential for discrimination. This Article addresses three AI-based hiring tools that rank and even reject applicants before they get to the interview stage—resume scanning, one-way video interviews, and the use of video games to screen applicants. It analyzes how the use of seemingly neutral AI in recruiting may discriminate against women and on what legal grounds a woman who is not hired might bring a legal claim challenging the use of these technologies. Part I summarizes the AI-based hiring technologies and analyzes the ways in which they might disadvantage women. Part II provides the overall framework for gender discrimination cases involving employment under Title VII of the Civil Rights Act. Part III applies the legal principles and precedents of Title VII law to the use of AI in hiring assessments, and Part IV proposes policy changes to ensure fairness in hiring in an era of algorithms.

I. Artificial Intelligence and Machine Learning in Hiring Decisions

Hiring software uses artificial intelligence and machine learning to create algorithms to predict which job applicants will be successful in the job.30 The term “artificial intelligence” refers to all computation efforts to code a machine to make decisions as though it were a human.31 “Machine learning” is a subset of artificial intelligence in which “the automated model-building process determines which input variables (or features) are most useful and how to combine them to best predict a behavior or outcome based on the latest data available.”32 In the hiring context, the algorithms look for correlations between various traits that applicants have and the traits of people who, by some measure, have succeeded in the job (such as the top managers in a company). What distinguishes machine learning from human-coded algorithms is that the computer, rather than a person, constantly modifies the algorithms to identify the “important” patterns.33 According to a joint Accenture and Harvard Business School study, 90% of Fortune 500 businesses use automated technology in hiring to “initially filter or rank potential middle-skills . . . and high-skills . . . candidates.”34

Advocates of the use of algorithms in hiring claim that AI reduces the time and cost of finding employees. But they often underestimate the complexity of testing their predictions and validating the results. When discussing the benefits of machine learning in the context of hiring, a team of economists analogized the process to a tool used during brain surgery.35 During a typical brain surgery to remove a tumor, doctors would generally over-remove brain tissue to ensure that all cancerous tissue is excised.36 A company developed an algorithm that, in conjunction with a medical imaging device, could analyze in real time the tissue the doctor was assessing during brain surgery.37 The algorithm predicted with around 90% accuracy whether the brain tissue under the wand was cancerous.38

In designing medical studies involving machine learning and cancer, researchers analyze thousands of tissue samples. They follow up by testing the tissue to determine if it is cancerous or not. The employment situation is much different. Algorithms are being developed using data from a limited number of existing employees (for a particular one-way video algorithm, it is 50 employees)39. In medical situations, researchers can easily measure false positives and false negatives by testing the tissue. But how do we determine whether the women who were rejected would have done better than the men who were hired?

The hiring context creates a challenge in both defining success and determining what contributes to it. It is surprisingly difficult to determine job success. We do not have a metric for what makes a good employee. Are the people in the top positions in the company or the highest-salaried people necessarily the smartest, most productive, most creative, and best leaders? And what traits actually ensure job success, as opposed to those traits that the supposedly “top” employees share, that are unrelated to doing the job well?

Ascertaining what makes a good employee is a challenge for artificial intelligence hiring technology.40 Peter Cappelli notes in the Harvard Business Review that researchers have been trying to determine what constitutes a good hire since World War I41: “So the idea of bringing in exploratory techniques like machine learning to analyze HR data in an attempt to come up with some big insight we didn’t already know is pretty close to zero.”42

Because the data used to train hiring algorithms consists of the kind of traits and qualities possessed by an existing pool of employees, the program will produce results to mirror and favor those inputs.43 For example, in medical school admissions, an algorithm trained on historic data incorporated the previous human decision biases44: the algorithm selected against women and those who were not native English speakers.45 If a hiring algorithm is modeled on an existing workforce without gender diversity, the results will also lack gender diversity. Any model trained to assess potential candidates will do little other than “faithfully attempt to reproduce past decisions” and, in doing so, “reflect the very sorts of human biases they are intended to replace.”46

Because an algorithm ultimately selects which criteria to include, the algorithm itself can consider both illogical47 and discriminatory48 variables in its decision-making process. The algorithm may focus on traits of top employees that have nothing to do with actual ability to do their job. For example, the artificial intelligence program created by the company Gild to find potential employees out in the wild processed a massive quantity of data and then advised clients that a good potential employee is someone who visits a certain Japanese manga site.49

In another instance, when one of his clients was about to employ a resume scanning program, attorney Mark Girouard inquired into the variables that the algorithm was prioritizing in applicants’ CVs.50 The algorithm identified two factors as indicative of successful job performance: first, that the candidate’s name was Jared, and second, that the applicant played high school lacrosse.51 Girouard noted that with such systems, “your results are only as good as your training data.”52 He said, “[t]here was probably a hugely statistically significant correlation between those two data points [(being named Jared and having played lacrosse)] and performance, but you’d be hard pressed to argue that those were actually important to performance.”53

As the Jared example shows, correlation is not causation. If Tony changed his name to Jared, he would not then have more skills. Moreover, creating algorithms by retrospectively assessing a workforce may doom the corporation to stagnation because the few employees who are visionaries with the ability to move the corporation forward would likely have traits that are underrepresented in the data set.

Although AI proponents often tout that their technologies combat discrimination,54 there are multiple ways in which gender discrimination may inadvertently crop up. Using data from preexisting top performers can lead to “hindsight bias” because the algorithms will presume that (1) the characteristics the algorithm identified led to success, rather than merely being correlated with it; and (2) the characteristics that led to success in the past will necessarily lead to success in the future.55 Hindsight bias can operate to the disadvantage of groups of individuals who have historically been excluded from the workplace, including women.56 Given that possibility, what legal recourse is available for women who are not hired because of bias in the algorithm?

II. The Law of Employment Discrimination Under Title VII

Title VII of the Civil Rights Act prohibits a broad range of discriminatory conduct based on an individual’s sex, including an employer refusing to hire an applicant,57 discharging an employee,58 refusing to promote an employee,59 or demoting an employee.60 The two main theories of liability under Title VII are disparate treatment and disparate impact.61

In 1978, the Equal Employment Opportunity Commission (EEOC) released the Uniform Guidelines on Employee Selection Procedures (Uniform Guidelines) under 29 C.F.R. § 1607.62 Based on court decisions,63 previous agency guidance,64 and the policies underlying Title VII, the Uniform Guidelines were designed to help both public and private employers comply with federal employment law.65 The Uniform Guidelines provide guidance about what types of employer conduct are permissible in assessing job applicants.66

These Guidelines provide that before using a selection tool for hiring, an employer should perform a job analysis to determine which measures of work behaviors or performance are relevant to the job or group of jobs in question.67 Then, the employer must assess whether there is “empirical data demonstrating that the selection procedure is predictive of or significantly correlated with important elements of job performance.”68 Although the Uniform Guidelines can shepherd employers through the tangle of federal law, the Supreme Court has explained that the “Guidelines are not administrative [‘]regulations’ promulgated pursuant to formal procedures established by the Congress.”69 Instead, they are an “administrative interpretation” of Title VII by an administrative agency.70 Nevertheless, the Supreme Court has consistently held that the Uniform Guidelines are “entitled to great deference.”71

In 2016, the EEOC held a meeting to educate itself on the use of algorithms in hiring.72 The Commission received testimony about the benefits of AI in recruitment73 and its risks.74 However, the EEOC has yet to articulate any general guidance regarding the effect of algorithms and machine learning on federal employment law.75 Consequently, a woman who is discriminated against in hiring must turn to the existing legal approaches by demonstrating that the use of an AI hiring technique caused disparate treatment or a disparate impact due to her gender.

A. Disparate Treatment

Disparate treatment is the most blatant form of discrimination because the employer’s conduct is intentional.76 Liability under the theory of disparate treatment requires a plaintiff to establish that her employer acted with a discriminatory intent or motive.77 A plaintiff can establish this in one of two ways.78 First, the plaintiff can present evidence of an employer’s explicit discriminatory statement,79 such as, “I would hire you, but I am not going to because you are [a female].”80 And second, the plaintiff can use indirect or circumstantial evidence of the employer’s conduct.81 An employer can even be liable for disparate treatment if the employer has a mixed motive, such as a legitimate reason for the decision in addition to the discriminatory one.82

In the U.S. Supreme Court case Price Waterhouse v. Hopkins, a woman who was passed over for partnership successfully argued intentional sex discrimination.83 The firm admitted that the employee was qualified and stated that she would have been promoted but for her interpersonal problems.84 By interpersonal problems, the firm meant that she was “aggressive” or “unduly harsh.”85 However, there was also evidence that the firm refused to offer her the partnership because the partners felt that she needed to wear more makeup;86 speak, walk, and talk more femininely;87 and be less aggressive.88 Other statements conveyed that the plaintiff was “macho” and that she should “take ‘a course at charm school.’”89

In this mixed motives case, the Court had to decide whether the interpersonal skills rationale was a legitimate nondiscriminatory basis for denying her the partnership or whether it was merely a pretext to disguise sex discrimination.90 The Court held that when a plaintiff can demonstrate that gender or gender stereotyping “played a motivating part in an employment decision,”91 the burden shifts to the defendant, who may avoid liability “only by proving by a preponderance of the evidence that it would have made the same decision even if it had not taken the plaintiff’s gender into account.”92 Expanding on the Court’s holding in her concurring opinion, Justice O’Connor explained that the employer’s statements constituted “direct evidence that decisionmakers placed substantial negative reliance on an illegitimate criterion in reaching their decision.”93 The case was reversed and remanded for further proceedings94 and ultimately decided in the employee’s favor.95

The second way to establish disparate treatment is by using indirect or circumstantial evidence.96 Circumstantial evidence can be used to show that the employer’s proffered reason is a pretext “unworthy of credence”97—for example, that the “employer’s explanation was contrary to the facts, insufficient to justify the action or not truly the employer’s motivation.”98 The plaintiff can also offer evidence of “suspicious timing, ambiguous statements oral or written, behavior toward or comments directed at other employees in the protected group, and other bits and pieces from which an inference of discriminatory intent might be drawn.”99 Evidence showing that the employer hired a less qualified applicant over the plaintiff in question, though not per se proof of pretext, may be evidence that the employer’s reasoning was a pretext for discrimination.100 This burden of persuading the court of the existence of pretext does not follow a rigid test, and “it is important to avoid formalism in its application, lest one lose the forest for the trees. Pretext is a commonsense inquiry: did the employer fire [or, as here, refuse to hire] the employee for the stated reason or not?”101

As opposed to being denied a job or promotion because they are too macho, some women are rejected as being not macho enough. In a disparate treatment case centering on pretext, Eldred v. Consolidated Freightways Corp. of Delaware, an assistant linehaul supervisor, Judith Eldred, was denied a promotion purportedly because she lacked aggression.102 John Bubriski was promoted over Eldred “because he was an enthusiastic and ‘aggressive’ employee, had worked previously as a supervisor in Dock, and had leadership experience as an officer in the Army Reserves.”103 Eldred, however, “was substantially more qualified for th[e] promotion”—she had superior evaluations, she was in her prior position longer than Bubriski, and Bubriski was often late to work and had “spotty” evaluations.104 In fact, the only positive evaluation in evidence that related to Bubriski’s performance in the assistant position came after his promotion and appeared to be “an after-the-fact justification.”105

Consolidated Freightways stated that it denied Eldred the promotion “because she lacked ‘aggressiveness’ and was too ‘soft’ with the drivers”106—justifications that were linked to gender stereotypes.107 The federal district court found that even if these characterizations about Eldred were true—which the court said was highly doubtful—they never affected Eldred’s job performance.108 Ultimately, the court found that Eldred was more qualified than Bubriski for the promotion, and the proffered reasons for the refusal to promote Eldred were pretexts for gender-based discrimination.109 The court went as far as to say that “[t]he unavoidable conclusion is not that plaintiff was passed over for the promotion because she was not aggressive; it was because she was not male.”110

An employer’s knowledge that a hiring practice discriminates against women, paired with evidence that shows the employer’s continued use of that same hiring practice, may also support an overall inference of intentional discrimination. Along those lines, in EEOC v. Joe’s Stone Crab, Inc., a restaurant—Joe’s Stone Crab (Joe’s)—sought to provide its customers with an “Old World” dining ambiance.111 In doing so, Joe’s management gave silent approval to the notion that male servers were preferable to female servers.112 The Eleventh Circuit Court of Appeals held that, by emulating “Old World traditions” of male servers, Joe’s intentionally excluded women.113

B. Disparate Impact

The theory of disparate impact can be used when an employer’s seemingly neutral policy or practice operates to the disadvantage of women.114 The employer then has a chance to show that its selection criteria are related to job performance and serve the employer’s legitimate business needs.115 The plaintiff can overcome such a showing by proving that alternative selection criteria would serve the employer’s legitimate business needs, but “without a similar discriminatory effect.”116

The Fourth Circuit Court of Appeals decision in United States v. Chesapeake & Ohio Railway Co. provides a helpful articulation of the employer’s burden of proof: “The test of business necessity . . . ‘is not merely whether there exists a business purpose for adhering to a challenged practice. The test is whether there exists an overriding legitimate business purpose such that the practice is necessary to the safe and efficient operation of the business.’”117

In disparate impact cases, plaintiffs most often establish their prima facie case of disparate impact by statistical comparison.118 The Supreme Court acknowledges that statistics can be an important source of proof in employment discrimination cases because, assuming an employer is engaged in nondiscriminatory hiring practices, the workforce should be “more or less representative” of the larger community in which it operates.119

The plaintiff is not required to show a disproportionate impact based on a comparative analysis of the actual applicants120 because courts recognize that “[t]he application process might itself not adequately reflect the actual potential applicant pool, since otherwise qualified people might be discouraged from applying because of a self-recognized inability to meet the very standards challenged as being discriminatory.”121

One statistical benchmark for assessing whether a selection procedure results in a disparate impact is the “four-fifths rule” enumerated in the EEOC’s Uniform Guidelines on Employee Selection Procedures.122 The Uniform Guidelines explain that “[a] selection rate for any race, sex, or ethnic group which is less than four-fifths (4/5) (or eighty percent) of the rate for the group with the highest rate will generally not be regarded by the Federal enforcement agencies as evidence of adverse impact.”123

The Supreme Court in Griggs v. Duke Power Co., a racial discrimination case,124 provides the framework and the theory for discriminatory impact cases.125 Prior to the passage of the Civil Rights Act, Duke Power Company prohibited African Americans from working in any department other than the janitorial department.126 The employees in that department were the lowest paid at the plant—even the highest paid employee in the janitorial department was paid less than the lowest paid employee in other departments.127 After the Act’s passage, Duke had to abolish the rule that African American employees were permitted only to work as janitors, but the company developed two new employment requirements for the other departments: (1) a high school degree and (2) a passing grade on standardized general intelligence tests.128

In holding that Duke’s employment requirements violated Title VII, the Court explained that the scope of the Act reached “the consequences of employment practices, not simply the motivation.”129 Under the Act, any employment criteria, while “fair in form,” cannot be maintained if “they operate to ‘freeze’ the status quo of prior discriminatory employment practices.”130 Even when there is no evidence of prior discriminatory practices,131 and even if Duke enacted their diploma and testing requirements in good faith,132 under Title VII, “good intent or absence of discriminatory intent does not redeem employment procedures or testing mechanisms that operate as ‘built-in headwinds’ for minority groups and are unrelated to measuring job capability.”133

In a case involving sex discrimination in choosing apprentice boilermakers, Bailey v. Southeastern Area Joint Apprenticeship Committee,134 those “built-in headwinds”135 resulted from points being awarded to applicants for criteria that were less likely to have been experienced by women—such as an extra five points for service in the military and an extra ten points for time spent in vocational school.136 As a result, 2,227 of 7,287 male applicants were accepted into the apprentice program, while only 2 of 94 female applicants were accepted.137 The female plaintiffs who brought the lawsuit were rejected from the apprenticeship program even though they actually had experience as boilermakers.138

The court acknowledged that the apprenticeship committee “undoubtedly developed its screening mechanism in good faith,” albeit “informally and unprofessionally”139 because they did not perform any validation study of the selection, screening, or ranking procedures they used in their hiring process.140 The court opined that the screening questions were likely “developed in blissful ignorance of [their] possible impact on women as a protected class under Title VII.”141 The court recognized that there may be some “tangential relevance” between military service, shop classes, vocational training, and performance on the job, in that those activities are “conceivably indicative of [the applicant’s] general ability to work in a group,”142 but on the whole, the defendant failed to meet its burden of showing a legitimate business necessity for these questions, nor were the questions a “reasonable measure of job performance.”143 Finally, the court determined that there were likely less restrictive alternatives to questions about prior military service, vocational training, and shop classes.144

Neither discriminatory intent nor previous discriminatory practice are prerequisites for a showing of disparate impact, thereby fashioning Title VII as a defense against more subtle forms of discrimination.145 Even where there is no conscious effort on the part of the employer to discriminate against a protected class, if its hiring policies or practices cause a disparate impact, the employer cannot escape scrutiny under Title VII.146

Previous disparate impact cases challenging pre-employment testing have often involved tests for civil service positions such as police officers,147 firefighters,148 and corrections officers.149 Employers posit that these positions require a minimum level of physical or mental skill,150 and they rely on pre-employment tests to determine whether an applicant meets their desired standard.151 But such tests have been routinely challenged for having a disparate impact on a protected class like women or minority candidates.152 Pre-employment tests for civil service positions therefore provide a useful frame of reference for the kinds of challenges that might be brought against an employer who uses AI hiring technologies that seek to measure skills which the employer believes are necessary for success in the position. A key aspect of this jurisprudence is that even reasonable-seeming testing criteria (such as strength or math ability) will be struck down if it disproportionately disadvantages women, unless it is necessary for the “safe and efficient”153 performance of the job.

In Berkman v. City of New York, a case involving a physical exam, a twenty—nine-year-old woman was the lead plaintiff in a class action against the New York City Fire Department.154 She had passed the written exam but failed the physical exam, resulting in her disqualification as an applicant.155 The physical test had a passage rate of 46% for men and 0% for women.156 The court determined that the test did not meet the EEOC’s validation metrics for pre-employment testing.157 The Berkman court concluded that “it [was] possible” that the tests contained “isolated references to work behaviors bearing superficial resemblance”158 to actual job performance, but, on the whole, the test did not “represent appropriate abilities”159 that would predict an applicant’s success on the job.160

Similarly, in Fickling v. New York State Department of Civil Service, plaintiffs brought suit under Title VII alleging that they were unlawfully terminated for failing an examination given as part of their job as Welfare Eligibility Examiners.161 The court assessed whether the content of the test was related to the content of the job and whether the scoring system “usefully selects” those applicants who are best suited to perform the job.162 The court determined that the test failed to comply with EEOC test validation metrics under the Uniform Guidelines because, among other things, 38% of the questions on the exam required arithmetic, even though the ability to do arithmetic was found to be “unimportant” to job performance based on an earlier analysis of the knowledge, skills, and ability of the ideal candidate.163

In United States v. Massachusetts, the United States sought to enjoin the Commonwealth of Massachusetts and the Massachusetts Department of Corrections from using the Caritas Physical Abilities Test to select entry-level correctional officers, arguing that the test had a disparate impact on women applicants.164 While the court understood that, as a matter of common sense and safety,165 factors like an individual’s speed, strength, and ability could be relevant to determining whether someone is suited to the job of a corrections officer, the court nevertheless determined that Massachusetts failed to show that the test was consistent with business necessity166 and necessary for “effective, efficient, or safe job performance.”167

III. Application of Title VII to AI Technologies in Hiring

A. Resume Scanning

1. The Technological Underpinnings of Resume-Scanning, Its Current Uses, and Its Gendered Impacts

Employers use artificial intelligence technologies to rate job applicants’ resumes.168 Resume scanning has been used by entities such as JCDecaux,169 University of Pennsylvania,170 MoneyCorp,171 Monster,172 Nissan,173 PharmEasy,174 Wal-Mart,175 General Electric,176 Starbucks,177 McDonald’s,178 Hyatt,179 UNICEF,180 and Chick-fil-A.181 Employers claim that AI technologies are necessary to deal with the torrent of resumes they receive for any given job.182 Proctor & Gamble, for example, received 1,000,000 applications for 2,000 jobs.183 The average number of resumes per job is actually a more manageable number, about 250 resumes per job posting.184

One approach to resume scanning is for the developers to decide in advance which words on the resume should lead to a job applicant either being rejected or moved to the next stage.185 Kathryn Dill of The Wall Street Journal reported on hospitals scanning nurses’ resumes to find those who had listed “computer programming” when hospitals needed nurses who could enter their patient data into the computer.186 Yet nursing candidates might emphasize care skills on their resumes and not think to add computer skills that they actually possess. Other examples include a power company scanning for customer service experience when hiring power line repair employees187 and a store’s algorithm only selecting for “retail clerks” if they have “‘floor-buffing’ experience.”188

Resume scanning technology can alternatively use artificial intelligence and machine learning to analyze the resumes and rank the candidates.189 Resume scanning companies claim their software analyzes and can select for traits such as attention to detail,190 leadership skills,191 and other qualities that “stand[] out.”192 To identify the characteristics thought to predict success, employers use resumes submitted by their current roster of top employees as the model for the dataset.193 The resulting hindsight bias may operate to the disadvantage of groups of individuals historically excluded from the workplace, including women.194 For example, if most managers in a company are men, and many happened to have been varsity football players, a resume scanning algorithm will give priority to resumes that also include “varsity football” credentials. Since very few women play varsity football, the algorithm will give priority to male candidates—even when playing the sport has no bearing on job performance. This is the process that led to the algorithm identifying the name Jared and having played high school lacrosse as the keys to success.195

Resume scans can also discriminate against women due to differences in language that men and women have been socialized to use. Women are more likely to use “we” when describing a project, while men are more likely to say “I” when talking about achievements,196 so an algorithm trained mostly on men will be biased to choose candidates with “I” language on their resume. Men are more likely to use active verbs like “executed”; in choosing resumes with male-gendered verbs, such as “executed” or “captured,” the Amazon algorithm disadvantaged women.197

The application of resume scanning programs that privilege maleness are reminiscent of the situation of Simone de Beauvoir and Jean-Paul Sartre, who both studied philosophy at the Sorbonne.198 They both sat for the agrégation, a civil service exam where the higher-ranked candidate got his or her pick of professorial jobs.199 They were neck and neck to be declared the top candidate.200 But the honor went to Sartre. Why? He received points for attending a prestigious high school.201 Since the school was for boys only, there is no way de Beauvoir could have matched him under that faulty “algorithm.”202

Discrimination can also result from the lack of context in resume scanning. A large and unexplained gap on a person’s resume is often a red flag for a prospective employer203 and will result in automatic rejection by the algorithm. If a human were reading an applicant’s resume, context clues (i.e., a more suburban address, a more distant graduation year, volunteer experience at a local elementary school) surrounding a large gap between professional experiences on a woman’s resume could indicate a break taken to raise children. To a resume scanning algorithm, none of this context is considered—the program merely red flags and downgrades a resume with a large work experience time gap,204 and the resume may never be seen by a human recruiter.

A 2021 joint study conducted by professors at Harvard Business School and professionals from Accenture found that around 27,000,000 people have been stopped by resume scanning from finding full-time employment.205 The study did not provide a gender breakdown of those who were, as they described it, “missing from the workforce.”206 The study notes that 88% of the employers said “that qualified high-skills candidates are vetted out of the process because they do not match the exact criteria established by the job description. That number rose to 94% in the case of middle-skills workers.”207

The researchers were critical of resume scanning algorithms because they can reject qualified candidates. They reject resumes with significant gaps in work experience,208 which can “eliminate huge swaths of the population such as veterans, working mothers, immigrants, caregivers, military spouses and people who have some college coursework but never finished their degree.” 209

2. The Potential Role of Existing Law in Response to Gender Discrimination in Resume Scanning

a. Disparate Treatment

What recourse does a woman have if she is rejected for a job by a resume scanning algorithm? She might be able to show disparate treatment if the algorithm downgrades an applicant based on sexist criteria, such as the use of “women” on the resume (such as “Captain, Women’s Lacrosse Team”) as in the Amazon algorithm example.

It could also be argued that an employer is engaged in disparate treatment if the process by which the technology is created is known to be biased in favor of men. Training a model on a dataset that overrepresented men would invariably lead to devaluing female candidates and thus is akin to intentional bias. In tech companies, for example, the existing representation of women is less than the four-fifths ratio suggested by the Uniform Guidelines. According to Google’s 2022 Diversity Annual Report, women made up 30.6% of the company’s tech hires in the United States, while men accounted for 69.4% of the company’s new recruits.210 Since tech companies can be expected to know that algorithms reflect the dataset on which they are trained, use of such an algorithm could be viewed as intentional discrimination based on sex.

Similarly, an intent to discriminate could be established if the employer has actual knowledge of the discriminatory effect of the algorithm through its own data of the gender breakdown of the people the algorithm ranks highly or through publication of a study about it. This would be similar to the studies done by ProPublica, which revealed that criminal sentencing algorithms discriminate against Black people.211 If a resume-scanning algorithm disfavors female applicants, the employer should realize the process is discriminating based on a protected characteristic. As one set of commentators opined, “it is not difficult to imagine courts taking a res ipsa loquitur attitude” in such circumstances.212

b. Disparate Impact

A woman could alternatively bring a disparate impact claim if resume scanning leads to a significant difference in the hiring of women versus men. Griggs v. Duke Power Co. noted that any employment criteria, while “fair in form,” cannot be maintained if “they operate to ‘freeze’ the status quo of prior discriminatory employment practices.”213 The use of the resumes of existing employees214 to serve as the benchmark for the resume-scanning algorithm is problematic in that it privileges men over women. The algorithm results in hindsight bias because it has the tendency to discount groups of individuals historically excluded from the workplace, including women.215

A female plaintiff might be able to show disparate impact if the algorithm scans for criteria that are much more likely to apply to men than women, such as playing football or military service. (Recall that in Bailey v. Southeastern Area Joint Apprenticeship Committee, the employer’s use of previous military service or participation in shop classes was held to be discriminatory.)216 If the algorithm scans for missing time periods in the resume217 (such as a year off between jobs, which may be more common to women who tend to take time off after childbirth), that, too, might be seen as discriminatory.

The burden would then be on the employer to show that the resume scanning technique was identifying job-related traits. In Griggs, the Court rejected the company’s argument that it should be allowed to use standardized intelligence tests in spite of the disparate impact they caused.218 The Court explained that an employer must demonstrate that any hiring metric must bear a “manifest relationship to the employment in question”219 and a “demonstrable relationship to successful performance of the jobs for which it [is] used.”220

Think of the situation in which women are disproportionately rejected because men tend to use more active verbs221 and are more likely to use “I” to claim credit instead of “we.”222 Is it really likely that those speech styles are tied to better performance on the job—or do they demonstrate that the person is more likely to be arrogant and take credit for another person’s work? Given the lack of objective studies of the ability of resume scanning to predict future job performance—and the sexist nature of algorithms like the Amazon one that was developed using a mostly male workforce—it will be difficult for employers to make a showing that the traits were job-related.

For some traits, the employer might have a better chance of clearing the job-related hurdle. For example, being on the football team might show leadership abilities or team skills. Or a time gap might indicate that someone is not devoted to their career. Then, it would be up to the woman to come up with an alternative to the challenged metric. For example, the woman could argue that she has alternative leadership or team skills, such as participation in other sports.223 And she could argue that rather than using a gap on her resume after childbirth to suggest a lack of devotion to a career, the potential employer could check references to see how well she performed in her previous jobs.

B. One-Way Video Interviews

1. The Technological Underpinnings of One-Way Video Interviews, Their Current Uses, and Their Gendered Impacts

One-way video interviews differ from standard interviews because they happen without a human interviewer.224 The job applicant logs in online and records herself or himself responding to prompts in the absence of a human representative of the employer.225 As with resume scanning algorithms, one-way video interviews are marketed as a more efficient way for employers to evaluate large numbers of candidates226 and to remove bias and subjectivity from the hiring process.227

One-way interviewing purportedly uses AI to analyze whether an applicant is creative,228 strategic,229 disciplined,230 driven,231 friendly,232 outgoing,233 assertive,234 persuasive,235 stress tolerant,236 and optimistic.237 This technology has been used for positions including customer operations clerks,238 warehouse workers,239 fast food crew members,240 retail supervisors,241 and by entities such as Six Flags,242 Facebook,243 Chick-fil-A,244 CA.gov,245 and McDonald’s.246

After the interviews are recorded, an algorithm can analyze the video components, the audio components, or a written transcript of the interview.247 One-way interviewing AI can assess how the applicant’s face moved when responding to each question to determine, for example, how excited the applicant seemed about a certain task or how they would deal with an angry customer.248 For one company’s algorithm, these facial analyses counted for 29% of the applicant’s score.249 The Chief Technology Officer of the company told Business Insider about its video interview analysis.250 She explained that the artificial intelligence algorithm analyzed different features important for different jobs251: if a job required client work, the algorithm weighted certain characteristics it read differently: “[T]hings like eye contact, enthusiasm . . . . Do they smile or are they down cast? Are they looking away from the camera?”252

When an employer decides to use a one-way video interview, the developer can create a tailored algorithm by recording existing employees and choosing employees whose traits match those of the current successful employees.253 HireVue asked employers to use the one-way video interviews on all existing employees, “from high to low achievers,” and then used their scores to create a “benchmark of success.”254 After new applicants sat for their assessments, HireVue would generate a “report card,” which showed how well the applicant’s score matched up with the existing high-performing workers in the job for which they applied.255

Hilton International used HireVue’s one-way video interviewing for “thousands of applicants for reservation-booking, revenue management and call center positions.”256 Although job recruiters at companies like Hilton have access to recordings of all the applicants, they generally will let the algorithm filter out the lower ranked candidates to save time. According to Sarah Smart, Hilton’s Vice-President of Global Recruitment, “[i]t’s rare for a recruiter to need to go out of [the top- ranked] range.”257

The risk of creating ideal candidate profiles based on the characteristics of existing employees is that the AI will discount the candidates who look, speak, express, dress, or present themselves differently from the current employees for reasons that have nothing to do with their qualifications for the job. If the technology is trained on a mostly male sample, the algorithm can erroneously presume that male traits, such as being tall, wearing a tie, or having a deep voice, are correlated with success on the job. Speech patterns, whether assessed via audio or transcripts, are also gendered.258 Comparing speech patterns of a mostly male workforce to that of female applicants can work to the disadvantage of female applicants (as it did with Amazon’s failed resume scanning attempts, which privileged the use of words more commonly used by men).259

A person’s linguistic style (i.e., their “characteristic speaking pattern”), will come through even when the content is transcribed into text.260 Linguistic style involves features such as “directness or indirectness, pacing and pausing, word choice, and the use of such elements as jokes, figures of speech, stories, questions, and apologies.”261 Essentially, “linguistic style is a set of culturally learned signals by which we not only communicate what we mean but also interpret others’ meaning and evaluate one another as people.”262 And, because different linguistic styles reflect different cultural norms, the patterns often differ for men and women.263 For example, girls and boys are socialized to communicate differently from a young age.264 Deborah Tannen, a professor of linguistics at Georgetown University, dubbed the way women learn to communicate as “rapport-talk” and the way men learn to communicate as “report-talk.”265 Girls tend to learn and engage in conversational styles that focus on building relationships with their peers, speaking modestly, and downplaying their own achievements, whereas boys engage in conversational styles that focus on status, self-promotion, and one-upmanship.266

Even small differences in communication styles, like the choice of which pronouns a person uses, can affect who gets credit for an idea in the workplace, or even who gets a job.267 Professor Tannen found that men say “I” in situations where women say “we.”268 These linguistic cues were so ingrained that she even recorded instances of women saying “we” when referring to the work they performed alone.269

Given the difference in communication styles between men and women, it is possible that a female applicant who applies for a position will be rejected because she makes “we” statements that highlight team- and relationship-building. Linguistic style differences were part of the reason that gender discrimination occurred in Amazon’s attempt to create a resume scanning algorithm. Trained on a dataset of mostly males, the algorithms learned to favor candidates who described themselves using verbs more commonly found on male engineers’ resumes, such as “executed” and “captured.”270 The use of one-way video interviews thus raises serious questions of discrimination based on an applicant’s gender, race, and age,271 leading critics to call it “a license to discriminate.”272

Nor will the one-way video interview necessarily identify competent potential employees because the technology looks for commonalities between existing employees without in-depth assessments of their performance and skills. While the AI systems may be able to tell the difference between a smile and a frown, they are less able to interpret the intent behind those physical expressions.273 A neuroscientist who studies emotion described the system as “worryingly imprecise in understanding what those movements actually mean and woefully unprepared for the vast cultural and social distinctions in how people show emotion or personality.”274 Even a former provider of video analysis in hiring, HireVue, has stepped away from analyzing the video images themselves after finding that “visual analysis has far less correlation to job performance than other elements of [their] algorithmic assessment.”275

2. The Potential Role of Existing Law in Response to Gender Discrimination in One-Way Video Interviews

a. Disparate Treatment

One-way video interviews present some of the same barriers to the hiring of women as does resume scanning, leading to similar potentials for disparate treatment claims. If the AI was trained on existing employees who are mainly men, it may erroneously assume that all sorts of male traits are prerequisites for performing well in the job—such as having shorter hair, a louder voice, a particular type of clothes, the use of “I” instead of “we,” or the use of more active verbs. Women who would have excelled in the actual job might never even get an in-person interview because they have been downgraded by the algorithm on frivolous grounds that have to do with maleness, not ability.

A disparate treatment claim would be appropriate when gender-based questions are posed in the video interview, such as asking women about how many children they have, if they plan to have children, if they are married,276 or about their salary history.277 As the EEOC makes clear,

Questions about an applicant’s sex . . . , marital status, medical history of pregnancy, future child bearing plans, number and/or ages of children or dependents, provisions for child care, abortions, birth control, ability to reproduce, and name or address of spouse or children are generally viewed as not job-related and problematic under Title VII.278

Similarly problematic issues might arise if an example is given in the question, such as asking whether the applicant participated in leadership programs like the Eagle Scouts or the Reserve Officers’ Training Corps (ROTC). Only 22% of ROTC cadets in the Class of 2020 were women,279 and most female job applicants never had an opportunity to participate in the Boy Scouts of America, since the organization only graduated their first class of female Eagle Scouts in 2021.280

Even when an employer does not ask gender-based questions, it is possible that AI can be harnessed to capture physical responses that carry an explicit connection to gender. For example, studies have shown that an estimated 60–70% of women experience shortness of breath during pregnancy.281 This symptom is linked to a variety of factors, including the development and movement of the fetus and the associated compression of a woman’s diaphragm.282 If an employer uses facial analysis, or even tracks and transcribes an applicant’s speaking patterns during a one-way video interview, the results may show that the applicant is pregnant based on the pauses or pacing to accommodate extra breaths. And, if the employer uses these findings to decide whether the applicant gets the job, it could likely be seen as an explicit and impermissible classification or differentiation based on gender and childbearing capacity.

If the AI awards a greater number of points to candidates who resemble or speak like men, this would seem analogous to the sexist treatment of Judith Eldred who was criticized as not being aggressive enough to be promoted—a justification that was found by the court to be impermissibly linked to a gender stereotype.283 And if an employer continues to use the algorithm after it disproportionately favors men, the employer could be found liable for disparate treatment, akin to what happened when an employer continued to use a discriminatory practice in EEOC v. Joe’s Stone Crab, Inc.284

b. Disparate Impact

Employers can be liable under the theory of disparate impact when a seemingly neutral policy or practice disadvantages individuals based on their protected class.285 Video interview analysis might, for example, downgrade female candidates because they use a different style of language than male candidates. As with resume scanning, women may be less likely to use aggressive words like “executed.”286 If the algorithm favored responses of the applicants who used those words or those who used “I” statements, the applicant could demonstrate that the process disadvantaged female applicants.287 As shown in the distinction between “report-talk” and “rapport-talk,”288 women tend to be more generous about giving credit to others,289 but that does not mean they are worse employees. And, to the extent that one of the justifications for hiring men was that they participated in team sports and would be better team players, women who speak in “we” statements may actually be better suited to contribute to team projects by allocating both responsibility and credit to others.

Joy Buolamwini, a researcher with the MIT Media Lab, has analyzed the risk of training AI with the inputs from an employer’s existing workforce—a risk magnified when using AI that performs voice and facial recognition.290 As she pointedly asks, “how do we know a qualified candidate whose verbal and nonverbal cues tied to age, gender, sexual orientation or race depart from those of the high performers used to train the algorithm will not be scored lower than a similar candidate who more closely resembles the in-group?”291 Thus, if the verbal cues and facial expressions of a largely homogenous workforce are used to train the patterns identified by an AI platform measuring enthusiasm for the job, the risk remains that people who do not use the same expressions or verbal cues will be discounted by an algorithm that is trained to search for similarities.

In a variety of cases challenging the hiring tests administered to fire department applicants292 and police department applicants,293 women were able to successfully bring disparate impact claims when the selection criteria (such as written tests and strength requirements)294 disproportionately led to the exclusion of female candidates and could not be shown to be job-related.295 A woman may be able to succeed with a disparate impact challenge to one-way video interviews because, like the questions used for police officers in Harless v. Duck,296 they lack a reasonable “degree of correctness” because they were developed using biased training data (i.e., the substantive responses and speaking patterns of men) and there has not been a relationship shown to success on the job.297 Even where enthusiasm and linguistic analyses claim to be facially neutral selection methods, much like the facially neutral and “blissful[ly] ignoran[t]”298 design of the boilermaker apprenticeship application in Bailey, “good intent or absence of discriminatory intent”299 will not suffice as a defense in the face of a disparate impact.

In the case of one-way video interviews, the hype that companies have used to market the technology to employers may come back to haunt them when employers are challenged to show empirically that the technology identifies traits that are actually job-related. If, as one company claimed, the AI can assess 15,000 data points that have to do with appearance, speech, eye contact, facial expressions, and more,300 it would take a study of tens of thousands or even hundreds of thousands of people to statistically correlate that number of traits with job performance. No studies of that magnitude have been performed. Employers cannot prove that the one-way video interviews have been validated empirically.

C. The Use of Video Games for Pre-Employment Testing

1. The Technological Underpinnings of Video Games in Pre-Employment Testing, Their Current Uses, and Their Gendered Impacts

Developers are marketing video games301 and companies are employing video games,302 for use in lieu of traditional hiring tests, to determine a job applicant’s traits and abilities. The developers claim that employers can “replac[e] archaic resumes with behavioral data”303 and by “captur[ing] thousands of behavioral data points,”304 their game assessments “build[] a profile of what makes a person and job unique.”305 Companies also claim they save about $3,000 per applicant if they can reject someone before the interview stage.306

General success in video gaming might be viewed by the employer as useful for certain jobs. It might measure the small motor skills needed by a surgeon307 or a drone pilot.308 But pre-employment video game screening has been used for positions that are not linked to gaming skills, including investment bankers,309 entry-level engineers,310 and project managers,311 and by companies such as JP Morgan,312 PwC,313 Daimler Trucks North America,314 Royal Bank of Canada,315 and Kraft Heinz.316 The video games are created by companies such as Knack317 and pymetrics318 to assess applicants’ traits. These video game assessments purportedly collect “thousands of behavioral data points”319 to analyze thousands of traits at one time, including attention,320 assertiveness,321 decision making,322 effort,323 emotion,324 fairness,325 focus,326 generosity,327 learning,328 and risk tolerance.329

Video game assessment companies ask current employees of an organization to play the game with the goal of ranking applicants in terms of the skills currently valued by that employer.330 The goal is to use machine learning on the video games’ data “to evaluate the cognitive and behavioral characteristics that differentiate a role’s high-performing incumbents to make predictions about job seekers applying to that role.”331

When an applicant plays a game, data is collected every millisecond to provide a list of qualities exhibited by the player.332 This data includes how long a player hesitates to make a decision,333 where on the screen a player touches,334 and the moves the player makes.335 The games vary336—one involves shooting water balloons at fast-approaching fire emojis,337 while another asks the applicant to select which side of the screen shows a larger or smaller proportion of colored dots.338

The company Knack offers three primary games—Meta Maze, Dashi Dash (also known as Wasabi Waiter), and Bomba Blitz. Meta Maze has the player arrange shapes from Point A to Point B. Dashi Dash has the player serve food to avatars representing people based on the avatar’s facial expressions. Bomba Blitz has a player save flowers by throwing water balloons at fireballs coming from a volcano. Knack’s founder claims that these games can assess “how you deal with stress, how you collaborate with people, [and] how much you listen.”339 The company also offers to analyze its data for specific sets of traits. For example, a Knack assessment for “High Potential Leadership Talent”340 claims to assess the following skills based on game play: self-discipline,341 solution thinking,342 relationship building,343 composure,344 reading people,345 critical thinking,346 striving,347 and agile leadership.348 After an applicant completes the series of games, the data collected is analyzed by the developer’s proprietary algorithms,349 and a profile of the applicant is created. This profile is then used by the company to determine whom to hire.

Game play technology—even if the results are shown to employers without the name or gender of the player listed—does not guarantee a gender-blind process. Men and women play games differently and value different aspects of game play.350 Any gender differences in game play may reduce a woman’s chance of having her traits match those of current model employees, leading to her being rejected without an interview.

Like the Amazon algorithms, not only can the use of video games discriminate against women, but it might not even lead to the hiring of the best employees. Correlation does not mean causation in terms of previous success,351 creating a disconnect between what the video game measures and what is important for a job. It is not immediately apparent how an applicant’s game play might affect the way a system’s algorithm scores the applicant. For instance, when we asked law students and their friends to play the Knack games, people who had no useful skills or interest in certain areas were nonetheless told they would make a good investment banker or doctor.

The games are often simplistic and seemingly unrelated to the actual job task, such as the use of Wasabi Waiter (now called Dashi Dash), a video game where the player is a waiter, to analyze how good a surgeon someone will be.352 In that game,353 perhaps the player’s ability to ascertain risk is analyzed based on whether a player focuses on serving restaurant customer emojis at risk of becoming dissatisfied, or cuts his or her losses by ignoring the emoji with the lowest level of satisfaction.354 But there is no empirical basis for believing that those actions assess emotional intelligence or other personality traits and predict job performance in an array of jobs from surgeon to investment banker to McDonald’s worker.

Employers use video games to assess applicants without proof that these technologies provide an adequate assessment of an individual’s capabilities and value. No truly independent research exists to judge the validity of these games because researchers studying the efficacy of the approach had conflicts of interest because they either owned stock in Knack,355 received fees from Knack to do the research,356 or, in the case of pymetrics, were asked to perform an analysis by the company and paid $104,465 to do so.357 Even these studies are deficient because they did not follow up to determine how people chosen by the algorithm actually performed in the job.

2. The Potential Role of Existing Law in Response to Gender Discrimination in Video Games in Pre-Employment Testing

a. Disparate Treatment

Under Title VII, employers are permitted to use pre-employment tests to screen candidates and to assist in making hiring decisions.358 In the past, employers have used such tests to measure a candidate’s cognitive abilities,359 physical abilities,360 personality,361 or other desired characteristics.362 However, as the Court explained in Griggs v. Duke Power Co., pre-employment tests, while “obviously . . . useful,”363 must be evaluated in light of the employment testing procedures developed by the EEOC.364 The Uniform Guidelines describe the standards such tests should meet. First, there needs to be an assessment of what characteristics are related to success on the job and how to test for those characteristics.365 Then, there must be a determination that there is “empirical data demonstrating that the selection procedure is predictive of or significantly correlated with important elements of job performance.”366

Even if a video game does not ask for information about the sex of the player, a certain style of play may be more associated with being a woman and thus allow the AI (and the employer) to distinguish between women and men. Women typically score higher than men on such tests in the following areas: “agreeableness, openness, extraversion, and warmth.”367 “[I]f an employer were to manipulate the requirements of the job or otherwise unfairly categorize female applicants based on their [personality test] scores,” then it would be engaging in a disparate treatment violation.368

Under the EEOC v. Joe’s Stone Crab, Inc. precedent, a disparate treatment claim might also be brought in a situation where an employer with knowledge that the video game discriminated against women continued to use the game. Ultimately, employers are “unlikely to escape disparate treatment liability if they deploy algorithms that make facially discriminatory classifications.”369

b. Disparate Impact

If job applicants are required to play a video game, a disparate impact claim could be brought if significantly fewer women are selected to be interviewed or hired after playing the game, either according to the

four-fifths rule enumerated in the Uniform Guidelines370 or a standard deviation analysis.371 A disparate impact claim against the use of video games in pre-employment testing does not require proof of intentional discrimination. Statistical bias can be present in an algorithm due to the way that certain variables can be omitted or downgraded.372 Or, the algorithms may even be “built using biased, error-ridden, or unrepresentative data” which could also lead to statistical bias.373 As Professor Pauline T. Kim notes, “data miners implicitly assume that the dataset used to train the model is complete enough and accurate enough to identify meaningful patterns among applicants or employees.”374 But by using data from a male-skewed workforce, the algorithm will likely privilege male traits.375

A disparate impact analysis of the video gaming algorithms in hiring will likely rely on precedents about testing for mental and physical abilities.376 Video game AI analyzes data about the applicants’ video game-playing style such as the order in which tasks are undertaken, where a person clicks on the page, and how the person reads the emotions of an avatar.377 If these analyses lead to significantly more men than women being hired, it is unlikely that an employer could prove these “were necessary for effective, efficient, or safe job performance.”378 While success at video games might be related to the skills needed to be a drone pilot, it would be hard to prove it is related to other jobs, such as being a store manager. An audit performed by one of the enterprises that markets video games for hiring conceded that there is no independent research to suggest that the company’s tests actually measure the skills correlated with job performance.379 Even if an employer could prove a relationship between a video game involving water balloons and a particular job, such as being a manager, the plaintiffs could still prevail by identifying an alternative screening practice that does not result in a disparate impact and is as effective in meeting the employer’s business needs.

D. Revising the Algorithm

If a developer realizes that its AI hiring algorithm is disfavoring women because it was trained on a mainly male workforce or because women behave differently in the eyes of the algorithm, the developer or an employer using the algorithm might attempt to “correct” the bias after the fact. For example, if women use “we” and men use “I” on a resume or in a one-way video interview, additional points could be added, after the fact, to individuals who use “we.” But tweaking the results after the fact to favor women can itself run afoul of Title VII.

Various entities have attempted to undo the gender bias in their hiring and recruitment algorithms. LinkedIn’s algorithms recommended different jobs based on a person’s gender (even when gender was not specified on a resume) because the algorithm analyzed the behavior of each applicant.380 Women, LinkedIn found, were less likely to apply for jobs that required work experience beyond their qualifications than men.381 Because of this gender difference, the job recommendations tended to disadvantage women.382 LinkedIn added a correction,383 explaining that “before referring the matches curated by the original [i.e. the one that can discern gender through behavior] engine, the recommendation system includes a representative distribution of users across gender.”384 Using an alternative approach, ZipRecruiter attempted to correct for gender bias in the algorithm on its platform by eliminating or changing words on a resume, such as waitress, that are associated with women.385

These after-the-fact attempts to balance gender are analogous to the situation in Ricci v. DeStefano, where the city of New Haven, Connecticut decided not to certify the results of an examination administered for promotions within the City’s fire department because the test disadvantaged minority candidates.386 The examination results showed that white candidates outperformed minority candidates,387 and, concerned about the possibility of a disparate impact lawsuit, the City threw out the results of the examination.388

Subsequently, white and Hispanic firefighters—who likely would have been promoted based on their test performance—sued the City.389 The plaintiffs alleged that the City’s refusal to certify the test results constituted disparate treatment discrimination in violation of Title VII.390 The Supreme Court held that despite the City’s “well intentioned” and “benevolent” objective, “the City made its employment decision because of race” which amounted to disparate treatment.391 Similarly, a well-intentioned effort to correct for an inherent gender bias in a hiring algorithm might also be vulnerable to a challenge from men alleging disparate treatment under Title VII.

IV. Policy Approaches to Combatting AI Gender Discrimination in Employment

When information collected in the hiring process poses risks of discrimination or privacy risks, or when any type of technology creates a potential risk to individuals or groups, there are three possible legislative approaches to regulating the practice or the technology. The employer could be required to disclose information about the practice or technology, could be prohibited from discriminating based on the information gleaned through that practice or technology, or could be banned from using that practice or technology. All three approaches are present in current employment law and in lawmakers’ attempts to regulate the use of technology in the employment sphere.

A. A Disclosure Policy Approach

A person who submits a resume or who undergoes a one-way video interview may have no idea that these items will be screened by AI rather than by a human. As a result, if a woman is not offered a job after applying, she may think the chosen candidate had better credentials and not think to inquire about whether she was a victim of biased AI. Under a policy of disclosure, an employer is permitted to use a technology or collect certain information but must disclose to candidates what technology the employer is using.

In the first AI interviewing legislation in the nation,392 Illinois in 2019 enacted the Artificial Intelligence Video Interview Act.393 The Act requires an employer to obtain the applicant’s consent before conducting AI analysis of a video interview.394 Additionally, any employer using AI in that situation must “[p]rovide each applicant with information before the interview explaining how the artificial intelligence works and what general types of characteristics it uses to evaluate applicants,”395 as well as maintain the confidentiality of any information shared by the applicant, and agree to destroy all copies of the interview within thirty days of the applicant requesting such action.396

A disclosure approach to the other hiring technologies described in this Article would similarly require advance disclosure of and require that consent be sought for a hiring process that uses AI assistance and machine learning. By disclosing how the process works, job applicants will become aware that the technology is developed through machine learning with mostly male employees. This could lead to pressure on employers not to use these biased tools.

B. An Anti-Discrimination Policy Approach

Disclosure to job applicants about a practice or technology may be of limited use unless the legislation also prohibits using the information collected in a discriminatory way. Disclosure alone means little if the only option for the applicant on learning that AI is being used is to seek a different job. At the very least, the disclosure approach should be coupled with a ban on the use of the information collected by the AI in a discriminatory way.

Prohibitions on discrimination are at the heart of Title VII, which prohibits employers from “fail[ing] or refus[ing] to hire . . . any individual . . . because of such individual’s race, color, religion, sex, or national origin.”397 EEOC guidelines and opinions drill down into what behaviors are prohibited. For example, the EEOC has issued agency guidance explaining that an applicant’s salary history, by itself, cannot “justify a compensation disparity”398 between men and women—an important provision to attempt to stop the practice of underpaying women relative to men. “Women job applicants, especially women of color, are likely to have lower prior salaries than their male counterparts.”399 “In 2020, women earned 84% of what men earned, according to a Pew Research Center analysis of median hourly earnings of both full- and part-time workers.”400 And because of the pervasiveness of the gender pay gap, “employers who rely on salary history to select job applicants and to set new hires’ pay will tend to perpetuate gender- and race-based disparities in their workforce.”401

In an effort to mitigate the perpetuation of this gender disparity, the EEOC has issued agency guidance explaining that an applicant’s salary history, by itself, cannot “justify a compensation disparity” between men and women.402 Rather, “permitting prior salary alone as a justification for a compensation disparity ‘would swallow up the rule and inequality in [compensation] among genders would be perpetuated.’”403

An anti-discrimination approach to AI-assisted hiring technologies would allow their use only if the employer could prove in advance that technologies would not create any built-in headwinds for women by institutionalizing male norms (for example, of speech, education, looks, or experiences).

C. Banning a Practice or Technology

In some cases, however, nothing short of a ban may work to achieve gender parity. This is especially true in the case of algorithms created through machine learning, where an employer may not even realize the machine has modified the algorithm to include discriminatory variables. Bans are not uncommon in employment law. Bans on certain hiring practices or hiring-related technologies are used to avoid discrimination, to protect privacy, and to avoid the use of technologies that do not function properly.

Employers are banned, for example, from using lie detectors tests in hiring.404 The reasons for the ban are similar to the reasons we might consider banning certain uses of AI in hiring. Lie detector tests are prohibited because they do not adequately predict a potential employee’s future behavior on the job.405 In fact, the Senate Committee on Labor and Human Resources found that “many employers and polygraph examiners abuse and manipulate the [polygraph] examination process, and frequently use inaccurate or unfounded results to justify employment decisions which otherwise would be suspect.”406 Approximately 400,000 “honest workers” had been inaccurately labeled as deceptive by polygraphs and thus faced adverse employment consequences.407

Employment laws also commonly ban the collection of certain information or the use of a particular technology to collect certain information. The logic behind such laws is that a ban on discriminatory uses of such information is not sufficient because it is difficult for a person denied a job to prove she was not chosen (or was offered a lower salary) based on that information or for some other reason. An employer might indeed be discriminating but the job applicant may have no way of knowing it or proving it if the employer is allowed to collect the information in the first place. As opposed to the federal guideline telling employers not to discriminate based on a woman’s past salary, many state laws prohibit the employer from collecting that information at all. The City of Philadelphia, after learning about the gender wage gap between men and women in the city, issued an ordinance408 that makes it unlawful for an employer “[t]o inquire about a prospective employee’s wage history, require disclosure of wage history, or condition employment or consideration for an interview or employment on disclosure of wage history, or retaliate against a prospective employee for failing to comply with any wage history inquiry.”409 In 2020, the Third Circuit determined that the ordinance did not violate employers’ First Amendment right to free speech.410 Twenty-three other states and municipalities similarly enacted bans on employers asking people for past salary history information.411

There are other prominent bans on employers obtaining certain information because it might facilitate discrimination or invade privacy. Some states ban employers from asking for job applicants’ social media passwords to get at private information about the employee.412 And courts have prohibited the use of certain screening tests once in common use (such as the Minnesota Multiphasic Personality Inventory (MMPI)) because they can generate information about a person’s health condition in violation of the federal Americans with Disabilities Act.413 Sometimes, the bans focus on technologies that elicit certain information that can lead to employment discrimination. The federal Genetic Information Nondiscrimination Act of 2008, for example, prohibits employers from requiring job applicants to undergo predictive genetic tests that indicate that they have a predisposition to later develop a genetic disease.414

D. Developing a Policy Response to AI-Assisted Hiring Technologies

AI-assisted hiring raises many of the problems that have led to bans in the past. Like the use of polygraphs, there is no proof that AI-assisted hiring correctly measures the traits that make a good employee. Like the MMPI, one-way video interviewing and video games can identify medical415 and psychiatric416 conditions.

Because AI hiring technologies discriminate and may not even identify qualified applicants, there is a sufficient rationale for a ban on their use. In a complaint filed with the Federal Trade Commission (FTC), the Electronic Privacy Information Center (EPIC) provided the policy rationale for a ban. EPIC argued that HireVue, a one-way video interview platform, “lack[ed] a ‘reasonable basis’ to support the claims”417 that HireVue’s “video-based algorithmic assessments ‘provide[] excellent insight into attributes like social intelligence (interpersonal skills), communication skills, personality traits, and overall job aptitude.’”418 Specifically, EPIC argued that the use of such technology was “unfair” and “deceptive” within the meaning of the FTC Act,419 and, moreover, that the use of AI can result in gender,420 racial,421 and neurological bias.422 As an unfair trade practice, EPIC noted, the tool “causes or is likely to cause substantial injury to consumers which is not reasonably avoidable by consumers themselves and not outweighed by countervailing benefits to consumers or to competition.”423 Before the FTC could act on EPIC’s complaint, however, HireVue issued a statement that it would discontinue the use of facial analysis in its screening technology.424

Given the limits of AI-assisted hiring technologies, a ban is an appropriate approach and would avoid the need to challenge the practices one by one in front of the FTC. Even short of a total ban, it would be useful to limit the situations in which AI-assisted hiring practices were permissible. If a ban cannot be achieved, we should adopt guidelines to help ensure appropriate gender representation—such as not being able to create or refine the algorithm on current employees if the representation of women among leaders of the company does not meet a four-fifths standard. This would serve to prohibit the use of AI-assisted hiring in many well-known tech companies425 and Fortune 500 corporations426 that are led primarily by men.

We could also require that, for any AI-assisted hiring, the algorithm be shown as valid in advance for the type of job at issue before it is applied. Along those lines, Congress or state legislatures could codify, with stiff penalties, the Uniform Guidelines approach that before using a selection tool for hiring, an employer should perform a job analysis to determine which measures of work behaviors or performance are relevant to the job or group of jobs in question.427 Then the employer must assess whether there is “empirical data demonstrating that the selection procedure is predictive of or significantly correlated with important elements of job performance.”428

Conclusion

The quest for fairness in hiring practices is not just about preventing discrimination. Gender diversity is also a driver of innovation and a stronger economy. A host of studies show that diverse teams make better decisions. Men working with other men tend to agree with each other. Adding women to the groups makes men prepare better and anticipate alternative arguments.429 As a result, mixed groups create more innovative solutions.430 Gender diversity can also help the bottom line. When business school professors assessed the companies that make up the Standard & Poor’s 1500, they found that having female representation in top management correlated to a $42 million increase in firm value.431

The use of resume scanning, one-way video interviews, and video games to screen applicants stifles diversity and creates for female applicants the sort of “headwinds”432 which have been viewed by the U.S. Supreme Court as impermissible under Title VII of the Equal Employment Opportunities Act. The developer of a social bookmarking site called Pinboard, Maciej Cegłowski, referred to the phenomenon more bluntly, “call[ing] machine learning ‘money laundering for bias.’ . . . ‘[A] clean, mathematical apparatus that gives the status quo the aura of logical inevitability.’”433

As with Title VII itself, our policy recommendations are not designed to give women an unfair advantage. They are instead an attempt to level the playing field so that women are not discriminated against by AI in ways that perpetuate existing bias. In that sense, we are asking no more than Ruth Bader Ginsburg asked of the Supreme Court at oral argument in Frontiero v. Richardson434 when she quoted the words of 19th century abolitionist and feminist Sarah Grimké: “I ask no favor for my sex. All I ask of our brethren is that they take their feet off our necks.”435 And their biased AI out of our job prospects.

 


* Lori Andrews, J.D., Professor of Law, Chicago-Kent College of Law; Director, Institute for Science, Law, and Technology, Illinois Institute of Technology and Hannah Bucher, J.D., Chicago-Kent College of Law. The authors wish to thank Adrienne Finucane, Bora Ndregjoni, Kelby Roth, and Andrew White for their research, editorial insights, creativity, and inspiration in connection with this Article. They are also grateful for the insights of Anita Bernstein, Richard Gonzalez, Ruth Kaufman, Ellen Mitchell, Clements Ripley, and Jim Stark.