Exactly this. Washington Post had an article sometime ago about software used in sentencing/parole that consistently judged black defendants more harshly (recommended longer sentences than white people for the same exact crime and recommended parole at a lower rate for black prisoners). The software was proprietary and there was no access to the code but even if it were perfectly written it learned from the data fed to it.
The software came to the conclusion that race (black vs white) was a predictor of crime and recidivism. How did it come to this conclusion? Because of the data
- Black people get arrested at a higher rate than white people
- Black people were also more likely to re-offend than white people
So the conclusion that the program came to makes sense. But it totally ignores the external factors that lead to the 2 above statistics
- Black neighborhoods are more heavily and more aggressively policed, meaning that you will uncover more crimes.
- Black people are targeted by police (black and whites consume drugs at the same rate, but black people are more likely to get stopped, more likely to get searched and more likely to get arrested)
- Over their lifetime, black people are more likely to have more contact with police (even if they live in an all white neighborhood). All it takes is for one of those times they commit a minor infraction (having weed on them etc). That conviction then becomes a justification for elevating their 'risk' level.
- Black ex-convicts have a harder time getting jobs due to inherent bias (all ex-convicts have a hard time, but black ex-convicts have a much harder time). This closes off avenues to gainful employment making turning back to crime one of the few options available to them. And once again, black ex-convicts have more contact with police than white ex-convicts.
Sans context, the input data can be an incredibly effective way of propogating bias into the model.
Even strong AI will not solve for this without some corrections/mintigating strategies. The most effective strategy would be solving the policing problem (bias, over-policing, how we close off all avenues at rehabilitation to people who have been convicted of victimless crimes)
The software came to the conclusion that race (black vs white) was a predictor of crime and recidivism. How did it come to this conclusion? Because of the data
- Black people get arrested at a higher rate than white people
- Black people were also more likely to re-offend than white people
So the conclusion that the program came to makes sense. But it totally ignores the external factors that lead to the 2 above statistics
- Black neighborhoods are more heavily and more aggressively policed, meaning that you will uncover more crimes.
- Black people are targeted by police (black and whites consume drugs at the same rate, but black people are more likely to get stopped, more likely to get searched and more likely to get arrested)
- Over their lifetime, black people are more likely to have more contact with police (even if they live in an all white neighborhood). All it takes is for one of those times they commit a minor infraction (having weed on them etc). That conviction then becomes a justification for elevating their 'risk' level.
- Black ex-convicts have a harder time getting jobs due to inherent bias (all ex-convicts have a hard time, but black ex-convicts have a much harder time). This closes off avenues to gainful employment making turning back to crime one of the few options available to them. And once again, black ex-convicts have more contact with police than white ex-convicts.
Sans context, the input data can be an incredibly effective way of propogating bias into the model.
Even strong AI will not solve for this without some corrections/mintigating strategies. The most effective strategy would be solving the policing problem (bias, over-policing, how we close off all avenues at rehabilitation to people who have been convicted of victimless crimes)