Law in the Internet Society

View   r6  >  r5  ...
BenWeisslerFirstEssay 6 - 28 Dec 2020 - Main.BenWeissler
Line: 1 to 1
 
META TOPICPARENT name="FirstEssay"
Changed:
<
<

Toward Algorithmic Disclosure

>
>

A Starting Point for Algorithm & Data Privacy Regulation

 
Changed:
<
<
-- By BenWeissler - 09 Oct 2020
>
>
-- By BenWeissler - 09 Oct 2020 // 2nd draft 27 Dec 2020
 
Changed:
<
<
Should the government require internet companies to disclose the inner workings of their algorithms to the public? I argue it should, and I examine the financial disclosures mandated by securities law as a useful analogue.
>
>
How should our law deal with a world in which decisions — about what we read, what interest rates we pay, what job opportunities we find, and so forth — are increasingly made by algorithms rather than human deliberation? Crucially, how do we ensure that the mass commercial use of algorithms develops in ways that support human freedom, rather than spirals into discrimination, exclusion, and division?
 
Changed:
<
<
The term "algorithm" is slippery, often shrouded in layers of unnecessary mystification. This essay adopts a broad definition of algorithm: any decision-making process embodied in code, typically one that uses input (data) to generate output (often predictions). An algorithm can be simple — e.g., an arithmetical formula — or complex — e.g., a machine learning model.
>
>

The Two ‘Business Ends’ of an Algorithm

The algorithms powering Facebook, Google, and Amazon sit at the nexus of two industrial paradigms: extraction and consumption. These algorithms depend on the scaled-up collection (extraction) of data, and they also feed humans a diet of media and information which can be more or less (but usually less) nutritious.
 
Changed:
<
<
Algorithms have been implicated in illegal housing and employment discrimination, and elsewhere have spurred extremism and hatred. Whether it is accurate to pin such maladies on algorithms is a point I revisit at the end of this essay.
>
>
Extraction and consumption — while mindful of the inherent limitations of analogy, this essay looks to those two domains for guidance on how algorithms can be regulated in the most effective and healthful ways.
 
Changed:
<
<

The Financial Disclosure Analogue

In 1934, Congress passed the Securities and Exchange Act (SEA), a sweeping piece of legislation which created the SEC and empowered that agency to set disclosure rules for publicly-traded companies. The SEC requires such companies to file quarterly and annual reports, forcing them to reveal their financial performance, risk factors, and executive compensation. Violation of these disclosure rules (e.g., a false statement or omission of material fact) might result in a public enforcement action by the SEC and/or a private lawsuit brought by investors.
>
>

Extraction, or: There Will Be Data

The cliché that data is “the new oil” is mainly geared towards peddling expensive CRM software to CTOs, but it inadvertently points to the extractive roots of data (and thus algorithms). When oil extraction goes wrong, it can devastate entire ecological systems (think of mass animal die-offs after Deepwater Horizon or the Exxon Valdez spills). Even without outright catastrophe, removing oil and gas from the ground is fraught with environmental risk (e.g., fracking and drinking water).
 
Changed:
<
<
It's valuable to step back and ask in the abstract: why have financial disclosure rules at all? After all, in a world designed by Chicago-school economists (the "libertarian dream world"), you might expect investors, individually or in groups, to contract freely with companies and thereby obtain a bargained-for level of disclosure. In the libertarian dream world, a company's stock price might trade not only as a function of its earnings and growth prospects, but also in relation to the quality of disclosure it provides.
>
>
Environmental law necessarily concerns itself with externalities and harm-prevention, rather than consent or privity. Contractual consent (whether bilateral or multilateral) is meaningless when a single actor can pollute the air or water that countless others breathe and drink. Instead of private contracting, environmental law looks toward administrative bodies (like the EPA) to prescribe technical requirements ex ante, and toward courts to apply flexible negligence standards ex post. Statutes like CERCLA prevent liability from dissipating.
 
Changed:
<
<
But we have rejected the libertarian dream world — and for good reason! Sunlight is the best disinfectant. Forcing companies to disclose financial information unlocks enormous benefits, with regard to efficiency and equity. Nobody wastes money or time bargaining for disclosure, and everybody gets to invest on a level playing field, starting from the same set of (true) information.
>
>
Data extraction is similarly “ecological” in nature: the behaviors of one have consequences for the many. If I use an email service which spies on its users, it is not only my words that are being read, but those of my interlocutor as well. If I upload my photos to a social media site, it is not only my face that is surveilled and analyzed, but also the faces of my friends. This extractive process touches participants and non-participants alike: as more and more data is gathered in one place, it becomes possible to make strong inferences about the “blind spots” (i.e., people not part of the dataset).
 
Changed:
<
<

What Problems Does Algorithmic Disclosure Solve?

How come we accept the libertarian dream world when it comes to algorithms? Algorithms increasingly supplant human choice in dictating what consumes our time and attention, what housing and jobs we have access to, what interest rates we pay, and more. And yet we rely on the benevolence of companies voluntarily making selective algorithmic disclosure, or else what scraps of information lawsuits and outside researchers can pry loose.
>
>
The comparison to environmental law puts a spotlight on the folly of consent in today’s digital economy. The model of consent embodied in Facebook’s Data Policy, or in Article 6 of the GDPR, is not adequate to meet the dispersed hazard at hand. (And even if consent were an applicable framework, it is hard to see how there could be meaningfully informed consent, given that a full 74% of Facebook users do not understand that Facebook collects their data to begin with.)
 
Changed:
<
<
Congress should wake from its slumber and pass algorithmic disclosure legislation. Exactly who must disclose what, at what level of detail and subject to what verification, are solvable problems and the proper subject of technocratic expertise. Algorithmic disclosure would carry a number of benefits:
  • First, direct legal benefits. In the libertarian dream world we currently inhabit, a plaintiff who wishes to bring a suit under the Fair Housing Act against a discriminatory housing algorithm faces an uphill battle. To survive a motion to dismiss, the plaintiff must plead with enough particularity to meet Twombly and Iqbal heightened “plausibility” standards. In all likelihood, however, the plaintiff lacks sufficient knowledge of how that black-box algorithm works in order to meet the Twiqbal bar and ever get access to discovery. Algorithmic disclosure would arm would-be plaintiffs with enough predicate information to get into court.
    • If financial disclosure and the SEA is any guide, the legal consequences of algorithmic disclosure will be far-reaching. One commentator has stated (somewhat tongue-in-cheek) that “everything is securities fraud” on the grounds that when a company does something bad, then (as companies are wont to do) does not disclose the bad thing to investors, it violates securities law. Contributing to global warming (and not disclosing it) is securities fraud, mistreating orcas (and not disclosing it) is securities fraud, and so on. A regime of algorithmic disclosure, like its financial disclosure counterpart, would create a legal dragnet, forcing companies ultimately to account for the bad things they do but omit from disclosure.
>
>

Consumption: The Jungle of Unsanitary Algorithms

If data collection is the ouroboros mouth, “consumption” is its tail — the point at which the algorithmic output is delivered to a user (generating more input data to enhance the algorithm).
 
Changed:
<
<
  • Second, behavioral benefits which flow from algorithmic disclosure. An open admission by corporate executives, under penalty of law, that “our company’s algorithm is designed to maximize engagement by strategically funneling users down extremist rabbit holes” might cause companies to rethink their algorithms and to fully internalize the reputational costs of such admissions. On the other end of the market, frank disclosures might cause users to rethink their engagement with platforms that use manipulative algorithms.
>
>
The analogy to literal food consumption is apt in two ways. First, the tendency to gorge ourselves on sugary, addictive morsels in our Newsfeeds has led to various maladies in the American body politic (polarization, rampant conspiracy theories, etc.). Second, like food appearing in a restaurant or supermarket, the algorithmic output we consume online is only the “last mile” step in a long, often-fragile supply chain.
 
Changed:
<
<

Defending the Proposal

Algorithmic disclosure is likely to draw criticism from two camps: those who (gasp) view it as a radical, impractical proposal and those who (yawn) think it does not go far enough.
>
>
The first problem of consumption (unhealthy diets) has typically been addressed in America through education. Except for a few unsuccessful “soda bans”, the primary strategy has been to target school-aged children with healthy cafeteria food. This suggests, in parallel, that children in particular should be exposed to healthy modes of algorithmic/online consumption. (Nutrition labels play an important educational role as well, and one could imagine an “algorithm label” listing various data ingredients which together compose the functional algorithm. But it’s hard to see something this chintzy actually changing any behavior, which might explain why the ad tech trade association has proposed something similar.)
 
Changed:
<
<
In the camp of shocked gaspers, we might find IPdroids claiming that a company’s algorithm is proprietary IP (a trade secret), and that the government cannot constitutionally take it via forced disclosure. Whatever the merits of this position, it can likely be overcome by careful design of the disclosures. More generally, disclosure-skeptics will have to answer for the broadly successful track record of the SEA and SEC over the past century.
>
>
The second problem of consumption (ensuring the integrity of supply chain) is not about consumption per se. The phenomenology of food consumption nowadays (effortless, convenient) obscures a complex and highly mechanized system of food production and distribution. Data cannot spoil like food can, though perhaps it can become adulterated when mixed together in unwanted or injurious ways. But a robust system for data privacy protection could draw inspiration from the existing food safety programs, including a muscular inspection regime. Poor data handling and data breaches should trigger a government response that is at least as severe as a big E. coli outbreak.
 
Changed:
<
<
The second camp, the yawners, have a deeper and more persuasive critique of algorithmic disclosure. To ventriloquize these yawners: “We should avoid ascribing to algorithms magical powers of destruction and divisiveness that they simply do not have. Especially when the source of our maladies lies elsewhere — in unchecked data collection, the centralized structure of internet services, and deeper socioeconomic malaise.” Algorithmic disclosure, however, involves no misdirection nor denial of these realities. It seeks only to narrowly improve our situation and unlock incremental benefits. Moreover, because of the ‘legal dragnet’ effect discussed above, there is reason to believe that algorithmic disclosure will cast a somewhat wider legal shadow, helping us to indirectly attack the issues yawners raise — even before our law and politics address those deeper issues directly.
>
>

Conclusion

The takeaway from these analogies is not that we should construct a national privacy law as a Frankenstein mash-up of environmental and food safety law. National privacy regulation will succeed only if it is designed as a cohesive and comprehensive system — versus today’s fragmented approach, which treats video rental records separately from DMV records.
 
Changed:
<
<
This is a fine draft, clever and engaging, somewhat irritating, as it should be. The lesson in the power of metaphors is also powerful. There's no particular relationship preexisting between the two forms of disclosure you chose to analogize, but having made the metaphorical connection, it then began to control the direction of your argument. If you had started from an environmental rather than a financial comparison, treating the disclosure of "algorithms" like the disclosures of hazardous chemicals in the workplace, or effluent disclosures from industrial discharge, you would have found a closer functional basis for comparison between two forms of regulation, and have reached different, though similar, rhetorical postures in your argument. You might try that, for the exercise.

But the primary route to improvement, I think, is to remove the factual misunderstanding that is the root of the argument. With respect to many "machine learning" applications, including the forms of recommendation and behavioral-cueing technologies you are discussing, there "algorithm" to be disclosed is basically trivial. Most ML, or even less descriptively "AI," structures depend for their effectiveness on their "training data," not on the executable computer programs, which are rather primitive routines, whose interconnections in "neural networks" depend not on the simple "algorithms" but on the sequence of data fed as raw material into those programs. What gets "disclosed" in the arrangements you have in mind is of no use in explaining the emergent properties of the system that contains this code.

About ten years ago I began receiving a few communications a year, first in the single digits then in the dozen range, form people (almost always trained at MIT) raising the same question:

Have you and Richard [Stallman] considered how to make free software principles work for machine learning? The source code of the programs doesn't do you any good in understanding or modifying the system: you need to have some copyleft that applies to the training data.

I always responded by saying that this was indeed a problem, and that it created an intractable subset of the general data licensing problem, which we hoped we could eventually solve on its own terms. In the latter part of the decade I would end each annual SFLC conference at CLS promising that "next year" we would discuss the issue. But by the time people knew it was there, a whole bunch of non-technical law and policy types, like Marc Rotenberg of EPIC, had invented "algorithmic transparency" as a policy prescription, and the bullshit-to-signal ratio climbed towards infinity.

Drop the assumption that when you know "the algorithm" you know anything. Assume instead that such disclosure is non-informative. Now what is your prescription?

>
>
The point of these analogies is to give us a concrete language for describing the failures of our current approach to algorithms and data privacy. Just as fiction or art can spur us forward by providing a peek of a better (or alternate) world, drawing connections to existing bodies of law can help illuminate the path toward a better system of algorithmic and data privacy regulation.
 



Revision 6r6 - 28 Dec 2020 - 01:50:23 - BenWeissler
Revision 5r5 - 14 Nov 2020 - 19:08:56 - EbenMoglen
This site is powered by the TWiki collaboration platform.
All material on this collaboration platform is the property of the contributing authors.
All material marked as authored by Eben Moglen is available under the license terms CC-BY-SA version 4.
Syndicate this site RSSATOM