Law in the Internet Society

Chaffing in practice

-- By SamuelRoth - 15 Jan 2015

Rivest's other algorithm

In his 1998 paper Chaffing and Winnowing, Ronald Rivest proposes a new method that, in theory, achieves strong information confidentiality without traditional encryption. His proposal involves splitting up the plaintext into packets and computing a "message authentication code" (MAC) for each one by combining the data in the packet with a secret key. The sender then adulterates the plaintext packets with a similar number of false packets, each with its own fabricated MAC, so that an eavesdropper cannot tell the "wheat" from the "chaff." Only the recipient, possessed of the secret key, can determine which MACs are valid, and thereby distinguish the true plaintext from false.

At the time of Rivest's writing, encryption techniques were at the center of a fierce public debate. Federal law restricted the exportation of encryption software, which meant that it could not be made publicly available on the internet without fear of criminal prosecution. Rivest proposed chaffing and winnowing, in part, to show that sending a plaintext message and subjecting that bitstream to encryption were two distinct operations that could be carried out by two different actors who had in no way coordinated their efforts. Rivest concluded that the theoretical possibility of such a scheme demonstrated "the difficulty (or impossibility) of drafting any kind of reasonable law restricting encryption or confidentiality technology."

Chaffing and winnowing has not been directly put into use, as it would be cumbersome as a cryptographic method, and the anti-surveillance movement has since won the debate over restriction of encryption techniques. Nevertheless, I propose that chaffing and winnowing does have potential useful applications—first, as a technical strategy of resistance to corporate surveillance, and second, as a means of preserving anonymity in big data.

Supplementing encryption

As Prof. Moglen pointed out in his comments, some software already puts chaffing-like strategies into use. TrackMeNot, a browser extension for Firefox and Chrome, conceals genuine user requests to search engines in a cloud of automated ones; over time, it learns the user's search habits and designs fake queries that should be hard to distinguish from bona fide searches. And adding bad plaintext to good helps to prevent against birthday attacks, a class of cryptographic exploits that rely on the attacker having found two blocks of plaintext that produce the same cryptographic hash.

But the idea's technical potential has not been exhausted. As Prof. Moglen pointed out in class, even encrypted email is not entirely free from surveillance when conducted on an unfree platform such as Gmail, because corporate surveillance can still account for the frequency of communications between users. Moreover, business intelligence can mine valuable data merely by determining who corresponds with whom. Encryption conceals the content of communications, but not the fact of their occurrence.

Chaffing, in that context, offers one solution: Users could chaff the database in a reversible way by use of a program that would interface between the user and the database, along the lines of the Lucent Personalized Web Assistant of the late 1990s. Whenever two users of the chaffing program first communicate with each-other, their instances of the program would agree on a private key by means of a secure handshake; this key would then be used to generate valid MACs for genuine emails between those two individuals. Meanwhile, those users' chaffing programs would start up a non-stop exchange of fake communications, so that future valid communications would be indistinguishable from so much noise.

That strategy, used in conjunction with encryption, would complicate a corporate data miner's ability to draw meaningful conclusions from the frequency of communications. As for the second deficiency of plain encryption identified above—the ability of business intelligence to build a network identifying who communicates with whom—the chaffing program also offers a potential solution: the users of the email chaffing program could consent to have fake emails sent to their address from all the users of the chaffing program with whom they have not yet executed a secure handshake—i.e., from strangers. Thus, from the perspective of the email provider, the network of people in communication with one-another and the network of people using the chaffing program are identical.

Supplementing anonymization

Big data marks a second area in which chaffing could prove useful as a technical method. Free distribution of large quantities of data promises to radically change the way we understand society and the individual. Even putatively anonymized data, however, can often be linked to personal identities through the use of clever analytical methods. Consider the security researcher who combined voting records and supposedly "anonymized" medical data to uncover the health records of the governor of Massachusetts. Similarly, Prof. Moglen discussed in class the release of New York City transit records, which revealed, as he said, every adulterer in City Hall.

Often, it seems, such unmasking depends on a careful process of elimination: The Massachusetts researcher, for instance, succeeded in part because only six people in the city of Cambridge shared the governor's birthdate. Presumably, the City Hall adultery was revealed through similar means: knowing that only one employee left a particular building at a given time, for instance.

But if every good row in such a database were supplemented by a bad row—or five—the process of elimination would much harder. The chaff could be crafted to statistically mirror the wheat, so that large-scale conclusions and trends drawn from the data would remain essentially valid, but the individual adulterers would be harder to pick out. (This application of chaffing bears some similarities to the idea of differential privacy, a technique that focuses on adding noise to the output from a dataset, rather than adding the noise to the dataset directly.) By adding a MAC to each row, the individual or organization responsible for the data would retain the ability to verify or disclaim individual pieces of data that proved of interest to trustworthy researchers.

Chaffing and winnowing will never enter wide cryptographic use. Nevertheless, in certain limited contexts, it offers a valuable supplement to encryption and to anonymization.

You are entitled to restrict access to your paper if you want to. But we all derive immense benefit from reading one another's work, and I hope you won't feel the need unless the subject matter is personal and its disclosure would be harmful or undesirable. To restrict access to your paper simply delete the "#" character on the next two lines:

Note: TWiki has strict formatting rules for preference declarations. Make sure you preserve the three spaces, asterisk, and extra space at the beginning of these lines. If you wish to give access to any other users simply add them to the comma separated ALLOWTOPICVIEW list.


Webs Webs

r3 - 15 Jan 2015 - 08:20:41 - SamuelRoth
This site is powered by the TWiki collaboration platform.
All material on this collaboration platform is the property of the contributing authors.
All material marked as authored by Eben Moglen is available under the license terms CC-BY-SA version 4.
Syndicate this site RSSATOM