Researchers Reduce Bias in aI Models while Maintaining Or Improving Accuracy
britneybrough edited this page 3 months ago


Machine-learning models can fail when they attempt to make forecasts for individuals who were underrepresented in the datasets they were trained on.

For circumstances, a design that anticipates the finest treatment option for somebody with a persistent illness may be trained utilizing a dataset that contains mainly male patients. That design might make incorrect forecasts for female clients when deployed in a healthcare facility.

To enhance results, engineers can try stabilizing the training dataset by getting rid of information points until all subgroups are represented equally. While dataset balancing is promising, it frequently requires eliminating big quantity of data, hurting the model's total performance.

MIT researchers established a brand-new strategy that identifies and gets rid of specific points in a training dataset that contribute most to a design's failures on minority subgroups. By removing far fewer datapoints than other methods, this method maintains the overall accuracy of the model while enhancing its efficiency regarding underrepresented groups.

In addition, the strategy can recognize hidden sources of bias in a training dataset that does not have labels. Unlabeled information are even more widespread than identified information for lots of applications.

This approach might likewise be integrated with other techniques to improve the fairness of machine-learning designs deployed in high-stakes circumstances. For instance, it might someday assist guarantee underrepresented patients aren't misdiagnosed due to a biased AI design.

"Many other algorithms that try to address this issue assume each datapoint matters as much as every other datapoint. In this paper, we are revealing that assumption is not real. There are particular points in our dataset that are contributing to this bias, and we can find those data points, remove them, and get much better efficiency," says Kimia Hamidieh, an electrical engineering and computer science (EECS) graduate trainee at MIT and co-lead author ai-db.science of a paper on this method.

She wrote the paper with co-lead authors Saachi Jain PhD '24 and fellow EECS graduate trainee Kristian Georgiev