From the course: Data Science Foundations: Fundamentals

Bias

- [Instructor] Sometimes, despite your best efforts and intentions, your data science projects just go haywire and you get garbage out of it. Sometimes it's things that are obvious, like the glitchy screen, but sometimes it's less obvious such as when an algorithm is biased in a ways that may not be immediately visible. I can give you a few classic examples. One is several years ago, Microsoft created a Tay Twitter bot that took only 12 hours to become a misogynistic, anti-Semitic conspiracy theorist and had to get taken down. Or there's the COMPAS Sentencing software. That stands for correctional offender management profiling for alternative sanctions, that gave inaccurate, racially biased, recidivism predictions for black and white defendants. A report by ProPublica found that black defendants, who in reality did not re-offend in a two-year timeframe, were nearly twice as likely to be misclassified as higher risk compared to similar white defendants. It was at 45% to 23%. There's also PredPol Crime Predictions, that stands for predictive policing, that predicted crimes at much higher than actual rates in primarily minority neighborhoods leading to increased policing. And then another familiar one is that Google's online jobs system showed ads for a much high-end payer jobs to men than it did to women. Now these are, by this point in time, well-known errors and many of them have been responsibly addressed, but I want to point out there's a couple of different sources for the error and the ones I want to talk about right now are, some of them are just technical glitches. So for instance, you have a training dataset, it's got limited variability. And so it can't go outside of that very well. You can't extrapolate very well. Also sometimes you have a statistical artifacts from small samples. If you're doing an algorithm that uses confidence intervals, but one of your groups is much smaller than the other, then their confidence interval's going to be much larger. And if you have a lower criteria in, like for instance, you must have a predicted probability of repaying your loan at least this high, a smaller group, just isn't going to make it as often as a larger group, just because of the way confidence intervals are calculated. Also, maybe you're focusing on overall accuracy of your classification model and ignoring what's happening with subgroups. If you have a very rare disease, you can simply ignore its existence and be highly accurate overall, but everybody recognizes that would be a very serious problem. But each of these things can happen just as a matter of implementing the algorithm, or the math that goes behind it, without necessarily having some of the bigger bias problems. On the other hand, you may also commit some failures. Some things that you should have known better when conducting the research. So for instance, maybe you had a failure to gather diverse training data sets. It's incumbent on somebody who's creating a system to try to do something, to get a wider range of data, to use with their system. Second, maybe there was a failure to capture a diversity of data labels. You know, not everybody interprets the same thing the same way. I mean, I don't know if you see these robots as cute or as scary, and ask any, differ from one person to another. So when you were labeling your data, deciding is this cute? Is this scary? Is it big? Is it small? Is it good? Is it bad? You need to get a very wide range of people to provide that data. And then also the failure to use more flexible algorithms, something that can capture the nuance that's in the data, the exceptions, the outliers that can matter, especially when you're looking at relatively small groups and relatively rare outcomes. Also, there's the risk of what are called self-fulfilling prophecies. So let's take the job ad as an example here. If a woman is shown ads for lower paying jobs, well, she'll probably apply for one of those lower paying jobs. And, then by extension, get one of those lower paying jobs. And what that does then is that the fact that she has that lower paying job now becomes a data point that goes in to the next iteration of the algorithm, says, "Aha, one more woman with a lower paying job. We will show more of them." And so what happens is that the algorithms actually have the possibility of creating the reality that they're trying to predict. Now, mind you, this is not blaming the victim, but it does let you know that when you are creating the algorithm, you have to find ways to get past some of these self-fulfilling prophecies or even vicious cycles. There are a few things that you can do. Number one, you can deliberately check for biased output. Compare this to some gold standard. Are you in fact showing men and women jobs that pay the same amount of money? Are you in fact getting output from more than just, say, for instance, English speaking, upper-middle-class people in the United States, but a much broader group. And you can check that with your data. Also when you're developing something that can have implications for lots of different groups, consult with all the parties. Do some focus work, talk to people, and see how they see the results of your algorithm. And then finally include diversity. Again, this means make a deliberate effort to include a broad range Of people, of circumstances, in your training data and in the labels that you have, and in the way that you develop the algorithms. The diversity can make such a difference. Demographic diversity, worldview diversities, technical diversity, any number of these and all of this can make your algorithm more robust and applicable to what is really a very broad world. Finally, if you'd like more information on this, you can consult the course, "AI Accountability Essential Training," which addresses issues of bias in algorithms specifically.

Contents