Facial recognition has a bias problem. A study in 2012 showed that facial algorithms from vendor Cognitec performed 5 to 10 percent worse on African Americans than on Caucasians, and researchers in 2011 found that facial recognition models developed in China, Japan, and South Korea had difficulty distinguishing between Caucasian faces and those of East Asians. In February, researchers at the MIT Media Lab found that facial recognition made by Microsoft, IBM, and Chinese company Megvii misidentified gender in up to 7 percent of lighter-skinned females, up to 12 percent of darker-skinned males, and up to 35 percent of darker-skinned females.
Fortunately, IBM’s making progress in addressing the problem. The company today published a research paper and accompanying dataset — Diversity in Faces (DiF), annotations of 1 million human facial images sourced from the publicly available YFCC-100M Creative Commons dataset — that aims to reduce prejudicial predictions in AI face-detecting algorithms.
The dataset is available upon request and comes just a week after MIT researchers claimed that Amazon’s facial analysis software — Rekognition — distinguishes gender among certain ethnicities less accurately than do competing services.
“Face recognition is a long-standing challenge in the field of artificial intelligence (AI),” the authors of the paper wrote. “The goal is to create systems that detect, recognize, verify, and understand characteristics of human faces. [F]ace recognition has achieved unprecedented accuracy … [but] while this is encouraging, a critical aspect limiting face recognition performance in practice is intrinsic facial diversity … We expect face recognition to work accurately for each of us. Performance should not vary for different individuals or different populations.”
To that end, IBM researchers annotated faces using 10 “well-established” and “independent” coding schemes from scientific literature, principally “objective” measures of human faces, such as craniofacial features — that is, head length, nose length, forehead height — and more “subjective” annotations, like age and gender. Also labeled were things like facial ratios (symmetry), visual attributes (age, gender), and pose and resolution, among other attributes.
It’s a relatively novel approach. Prior research has focused largely on how faces differ by age, gender, and skin tone and how different faces vary across those dimensions. By contrast, the IBM team honed in on over 47 total feature dimensions and attributes, which they say equally contribute to strong algorithmic performance.
“[The 10] schemes [in the DiF dataset] are some of the strongest identified by the scientific literature, building a solid foundation to our collective knowledge,” said John R. Smith, IBM fellow and a lead author of the paper. “[They] offer a jumping off point for researchers around the globe studying the facial recognition technology.”
In preliminary tests, the scientists found that the DiF dataset provided a “more balanced” distribution and “broader coverage” of facial images compared to previous datasets. They believe that the insights obtained from statistical analysis of the coding schemes have the potential to “further the community’s understanding” of what’s important when it comes to characterizing human faces and will enable IBM to find new methods for improving its facial recognition technology.
“We believe by extracting and releasing these facial coding scheme annotations on a large dataset of 1 million images of faces, we will accelerate the study of diversity and coverage of data for AI facial recognition systems to ensure more fair and accurate AI systems,” Smith said. “Today’s release is simply the first step.”
Progress toward less biased AI
IBM is not the only one attempting to tackle bias in facial recognition systems.
In June, working with experts in artificial intelligence (AI) fairness, Microsoft revised and expanded the datasets it uses to train Face API, a Microsoft Azure API that provides algorithms for detecting, recognizing, and analyzing human faces in images. With new data spanning skin tones, genders, and ages, it was able to reduce error rates for men and women with darker skin by up to 20 times, and by 9 times for women.
Meanwhile, Gfycat, a user-generated short video-hosting startup based in San Francisco, said this year that it managed to improve its facial recognition algorithms’ accuracy for people of Asian descent by applying stricter detection thresholds.
And Amazon says it’s continually working to improve the accuracy of Rekognition by making funding available for research projects and staff through the AWS Machine Learning Research Awards, most recently through a “significant update” made in November 2018. The company also expressed “interest” in establishing standardized tests for facial analysis and facial recognition and in working with regulators on guidance of the technology’s use.
An emerging class of algorithmic bias mitigation tools, meanwhile, promises to accelerate progress toward more impartial AI.
In May, Facebook announced Fairness Flow, which automatically sends a warning if an algorithm is making an unfair judgment about a person based on their race, gender, or age. Accenture released a toolkit that automatically detects bias in AI algorithms and helps data scientists mitigate that bias. Microsoft launched a solution of its own in May, and in September Google debuted the What-If Tool, a bias-detecting feature of the TensorBoard web dashboard for its TensorFlow machine learning framework.
Not to be outdone, IBM last fall released AI Fairness 360, a cloud-based, fully automated suite that “continually provides [insights]” into how AI systems are making their decisions and recommends adjustments — such as algorithmic tweaks or counterbalancing data — that might lessen the impact of prejudice. And recent research from IBM’s Watson and Cloud Platforms group has focused on mitigating bias in AI models, specifically as it relates to facial recognition.