A Starting Point for Algorithm & Data Privacy Regulation

-- By BenWeissler - 09 Oct 2020 // 2nd draft 27 Dec 2020

How should our law deal with a world in which decisions — about what we read, what interest rates we pay, what job opportunities we find, and so forth — are increasingly made by algorithms rather than human deliberation? Crucially, how do we ensure that the mass commercial use of algorithms develops in ways that support human freedom, rather than spirals into discrimination, exclusion, and division?

The Two ‘Business Ends’ of an Algorithm

The algorithms powering Facebook, Google, and Amazon sit at the nexus of two industrial paradigms: extraction and consumption. These algorithms depend on the scaled-up collection (extraction) of data, and they also feed humans a diet of media and information which can be more or less (but usually less) nutritious.

Extraction and consumption — while mindful of the inherent limitations of analogy, this essay looks to those two domains for guidance on how algorithms can be regulated in the most effective and healthful ways.

Extraction, or: There Will Be Data

The cliché that data is “the new oil” is mainly geared towards peddling expensive CRM software to CTOs, but it inadvertently points to the extractive roots of data (and thus algorithms). When oil extraction goes wrong, it can devastate entire ecological systems (think of mass animal die-offs after Deepwater Horizon or the Exxon Valdez spills). Even without outright catastrophe, removing oil and gas from the ground is fraught with environmental risk (e.g., fracking and drinking water).

Environmental law necessarily concerns itself with externalities and harm-prevention, rather than consent or privity. Contractual consent (whether bilateral or multilateral) is meaningless when a single actor can pollute the air or water that countless others breathe and drink. Instead of private contracting, environmental law looks toward administrative bodies (like the EPA) to prescribe technical requirements ex ante, and toward courts to apply flexible negligence standards ex post. Statutes like CERCLA prevent liability from dissipating.

Data extraction is similarly “ecological” in nature: the behaviors of one have consequences for the many. If I use an email service which spies on its users, it is not only my words that are being read, but those of my interlocutor as well. If I upload my photos to a social media site, it is not only my face that is surveilled and analyzed, but also the faces of my friends. This extractive process touches participants and non-participants alike: as more and more data is gathered in one place, it becomes possible to make strong inferences about the “blind spots” (i.e., people not part of the dataset).

The comparison to environmental law puts a spotlight on the folly of consent in today’s digital economy. The model of consent embodied in Facebook’s Data Policy, or in Article 6 of the GDPR, is not adequate to meet the dispersed hazard at hand. (And even if consent were an applicable framework, it is hard to see how there could be meaningfully informed consent, given that a full 74% of Facebook users do not understand that Facebook collects their data to begin with.)

Consumption: The Jungle of Unsanitary Algorithms

If data collection is the ouroboros mouth, “consumption” is its tail — the point at which the algorithmic output is delivered to a user (generating more input data to enhance the algorithm).

The analogy to literal food consumption is apt in two ways. First, the tendency to gorge ourselves on sugary, addictive morsels in our Newsfeeds has led to various maladies in the American body politic (polarization, rampant conspiracy theories, etc.). Second, like food appearing in a restaurant or supermarket, the algorithmic output we consume online is only the “last mile” step in a long, often-fragile supply chain.

The first problem of consumption (unhealthy diets) has typically been addressed in America through education. Except for a few unsuccessful “soda bans”, the primary strategy has been to target school-aged children with healthy cafeteria food. This suggests, in parallel, that children in particular should be exposed to healthy modes of algorithmic/online consumption. (Nutrition labels play an important educational role as well, and one could imagine an “algorithm label” listing various data ingredients which together compose the functional algorithm. But it’s hard to see something this chintzy actually changing any behavior, which might explain why the ad tech trade association has proposed something similar.)

The second problem of consumption (ensuring the integrity of supply chain) is not about consumption per se. The phenomenology of food consumption nowadays (effortless, convenient) obscures a complex and highly mechanized system of food production and distribution. Data cannot spoil like food can, though perhaps it can become adulterated when mixed together in unwanted or injurious ways. But a robust system for data privacy protection could draw inspiration from the existing food safety programs, including a muscular inspection regime. Poor data handling and data breaches should trigger a government response that is at least as severe as a big E. coli outbreak.


The takeaway from these analogies is not that we should construct a national privacy law as a Frankenstein mash-up of environmental and food safety law. National privacy regulation will succeed only if it is designed as a cohesive and comprehensive system — versus today’s fragmented approach, which treats video rental records separately from DMV records.

The point of these analogies is to give us a concrete language for describing the failures of our current approach to algorithms and data privacy. Just as fiction or art can spur us forward by providing a peek of a better (or alternate) world, drawing connections to existing bodies of law can help illuminate the path toward a better system of algorithmic and data privacy regulation.