Researchers Reduce Bias in aI Models while Maintaining Or Improving Accuracy

Machine-learning designs can fail when they attempt to make forecasts for people who were underrepresented in the datasets they were trained on.

For circumstances, a model that forecasts the best treatment choice for someone with a persistent illness may be trained using a dataset that contains mainly male clients. That model may make inaccurate forecasts for female patients when deployed in a hospital.

To enhance results, engineers can try stabilizing the training dataset by eliminating data points till all subgroups are represented equally. While dataset balancing is appealing, it frequently requires getting rid of big amount of information, injuring the design's general performance.

MIT researchers established a new strategy that recognizes and removes specific points in a training dataset that contribute most to a model's failures on minority subgroups. By getting rid of far fewer datapoints than other techniques, this method maintains the total accuracy of the design while enhancing its efficiency regarding underrepresented groups.

In addition, the technique can recognize hidden sources of predisposition in a training dataset that lacks labels. Unlabeled information are much more common than identified data for lots of applications.

This technique could also be integrated with other methods to enhance the fairness of machine-learning models deployed in high-stakes scenarios. For instance, it may someday help guarantee underrepresented clients aren't misdiagnosed due to a biased AI model.

"Many other algorithms that attempt to address this issue assume each datapoint matters as much as every other datapoint. In this paper, we are revealing that assumption is not true. There are specific points in our dataset that are adding to this predisposition, and we can find those data points, remove them, and get much better performance," states Kimia Hamidieh, koha-community.cz an electrical engineering and computer system science (EECS) graduate trainee at MIT and co-lead author of a paper on this strategy.

She composed the paper with co-lead authors Saachi Jain PhD '24 and fellow EECS graduate trainee Kristian Georgiev