Machine Learning and Discrimination: A Hidden Crime

-- JosephHan - 12 May 2024

The mass collection and surveillance of people by tech companies collecting their data is not a new phenomenon. However, as technology continues to improve and grow, algorithms analyzing human data have become more common. Corporations have a large incentive to adopt such tools: it can decrease labor costs thus increasing profits. Examples of this include job application screening, loan/financing approvals, and rental applications. However, human relationships are complex and data can reveal far more than it first appears to. As more businesses begin to incorporate these tools, it is important to assess the full effects of those decisions. Deep learning trained AI models that assess human candidates have a high risk of unlawful discrimination in violation of the equal protection clause of the 14th Amendment.

Race is Easily Inferred

Although unlawful discrimination could happen regarding any of the protected characteristics in the 14th Amendment, we will analyze race due to its ease of inference and the importance of protecting people based on their race.

Zip codes can be used as a proxy for race. Zip codes are an effective representation of race and ethnicity information, particularly for white, black, and Latinx groups.

Similarly, names are also effective at predicting ethnicity. Studies have shown that models that analyze names by analyzing the sequencing of letters in the name can have very high accuracy. Additionally, census data regarding names can also be an accurate predictor.

Data may appear neutral, but they easily serve as a proxy for race. Race and social statistics are highly correlated and that can have a disparate discriminatory impact despite leaving out actual racial data.

Neural Networks and Machine Learning

A growing issue in the realm of algorithmic human selection is the increasing use of machine learning. In order to understand the harm that is occurring, it is essential to understand the technology being implemented.

A human user has two points of contact with any model: the input and the output. By feeding an AI model inputs, such as human candidates, the model will use its previous “experience” in solving similar problems to look at the input data and create an output, which in this example would be the humans selected.

Computations are completed through a neural network, which is designed to simulate the functioning of neurons in a human brain. Neural networks extract “features” from the input data, such as an applicant’s credit history, previous employment, income, name, gender, and race. Although these can be filtered out before feeding it to the model (such as gender and race), data points that are neutral on their face (such as zip codes) are often included. The model then gives various weights to these extracted features in a hidden layer. The points at which these intermediate calculations are done are called nodes. Using these intermediate calculations, the AI is able to come to an output: in this example, a rejection or approval. This process can be visualized below.

The weights and calculations are created through a process called “training”. During training, the model analyzes data it is given. A training set would be applicants and their data corresponding to a correct answer, such as whether an applicant should be approved or denied.

Deep Learning Hides the Operation of Neural Networks

Deep learning has become the new trend in AI development. ChatGPT is an example of a large language model that uses deep learning to train its AI. Deep learning is a subset of machine learning where there are at least two layers between the input and output layers.

There are two key distinctions: the model often learns “on its own” with little human intervention and deep learning tends to use many more intermediate layers in the neural network. Both of these features make it more difficult to deduce exactly which input features the model is considering and to what extent.

Intermediate layers complicate the calculations, and given that there are many more nodes in deep learning, any individual node will have a lesser effect on the final output. It is possible for the zip code data point to be a small factor in many nodes within the neural network. This means that a detailed look into the intricacies of the neural network may not show effects from zip codes, but has a large impact on the final output.

It is difficult to assess biases of deep learning models since there is no requirement for a human to give feedback to the model during the training. Human oversight is necessary to correct biases, yet a “feature” of this new technology is that this step is not required. The technological improvements of deep learning compared to machine learning increases the potential for discriminatory effects by impeding the ability to diagnose models and removing the necessity for human oversight.

Conclusion

The increasing use of deep learning models raises a serious concern regarding the unintended effects of such a model. Although it may use data that seems unbiased, the interconnected nature of our existing data is undeniable. Deep learning models are deeply complicated with little need or desire to really assess the intricacies of how it creates its outputs. In models that evaluate human applicants, this can result in discrimination based on protected classes, namely race. Strict regulations of these models is necessary in order to prevent this technology from infringing on the Constitutional rights of citizens.

Attachments Attachments

	Attachment	Action	Size	Date	Who	Comment
	nn-ar.jpg	props, move	14.5 K	12 May 2024 - 20:29	JosephHan