### The Bias of AI in Résumé Screening: A Comprehensive Analysis of Language Models Favoring White and Male Candidates
In recent times, artificial intelligence (AI) has become a vital component of various sectors, especially in human resources (HR). AI-powered tools, particularly large language models (LLMs), are progressively being utilized to streamline processes such as résumé evaluation and candidate assessment. Nevertheless, an expanding array of research indicates that these AI systems may not be as unbiased as we desire. A recent study has uncovered that language models frequently display preferences, favoring white and male candidates over others, even when qualifications are the same.
#### The Study: Unearthing AI Bias in Résumé Screening
A new paper, showcased at the AAAI/ACM Conference on AI, Ethics, and Society, underscores the prevalence of bias in AI-based résumé screening. Researchers from the University of Washington executed a study utilizing three different Massive Text Embedding (MTE) models, all derived from the Mistal-7B large language model (LLM). These models were refined with various datasets to enhance their efficacy in activities such as document retrieval, classification, and clustering.
The researchers processed hundreds of publicly accessible résumés and job postings through these MTE models. Rather than depending on straightforward keyword matching, the models produced “embedded relevance scores” for each résumé and job description combination. This enabled the researchers to evaluate how closely each résumé aligned with the job description based on the model’s text comprehension.
To assess bias, the researchers initially ran the résumés through the models without any names attached. Subsequently, they repeated the experiment using names with high scores for racial and gender distinctiveness, indicating that the names were strongly linked to specific racial or gender categories. The objective was to observe whether the models favored certain names over others in the résumé evaluations.
#### The Results: A Distinct Pattern of Bias
The results of the study were alarming. Across more than three million résumé and job description evaluations, the MTE models consistently showed a preference for white and male names. Specifically:
– **White names were selected in 85.1% of the evaluations**, compared to only 8.6% for Black names. The remaining evaluations displayed negligible differences.
– **Male names were chosen in 51.9% of the evaluations**, while just 11.1% were for female names.
– In intersectional comparisons (considering both race and gender), **Black male names were never favored over white male names** across any of the bias evaluations.
These patterns were apparent across a variety of job descriptions, regardless of the typical racial or gender demographics of the roles in reality. This implies that the bias does not mirror societal trends but is instead a product of the models’ intrinsic preferences.
#### Why Does This Occur? The “Default” Dilemma
The researchers suspect that the bias arises from the models’ treatment of certain identities as the “default.” In this situation, the models appear to consider “masculine and White concepts” as the standard, with other identities regarded as anomalies. This presents a serious concern, as it indicates that the models are not evaluating candidates solely based on qualifications, but rather are affected by the names linked to those candidates.
The inclination towards white and male names was often minor in individual tests, with the “percentage difference in screening advantage” being around 5% or lower in the majority of instances. However, the persistent nature of the bias across millions of evaluations accumulates, possibly resulting in substantial discrepancies in hiring outcomes.
#### Real-World Consequences: AI Bias in Recruitment
The findings of this study are troubling, particularly given the growing dependence on AI in recruitment. While the biases displayed by the MTE models in this controlled study may not precisely reflect how AI tools function in practice, they spotlight a significant issue: AI systems are susceptible to bias. In fact, they frequently mirror the biases present in the data utilized for training.
This isn’t the first instance of AI bias in recruitment coming to the forefront. In 2018, Amazon was compelled to abandon an internal AI recruiting tool after it was discovered to favor male candidates. The tool had been trained on résumés sent to the company over a decade, most of which were submitted by men. Consequently, the AI learned to prefer male candidates, even penalizing résumés that contained the term “women’s” (e.g., “women’s chess club”).
#### The Urgency for Bias Mitigation in AI
As AI continues to assume a more prominent role in hiring and other decision-making processes, it is essential to tackle these biases. Some firms, such as Salesforce, have enacted thorough testing for bias and toxicity in their AI models prior to their deployment in production. These organizations also incorporate safeguards and controls to avert harmful outcomes.
Nevertheless, further efforts are required to ensure that AI systems are just and equitable. This encompasses not