Computers, Privacy & the Constitution

Do users have a right to control data about themselves?

-- By LeonHuang - 05 Mar 2017

Service providers typically keep anonymized records of how users are using their services, and the users’ agreements, which the users have consented to, typically allow service providers to use these records. Users who care about both convenience and privacy may be motivated to argue that they should have a say over data about themselves. Is this position well-justified?

An Review of the Argument

The position appeals to our intuition in two ways. First, the content of the records is produced by the users. Users may deem themselves as the author of such data and in turn the owner of the data. Second, users expect to have control over their privacy. The records on users’ behaviors contain private information that users do not expect others to know, and therefore the users would think that they should have some say in how the records are being used.

There are counterarguments too. The keeping of the records is performed by the service providers. Although the content of the records is largely dependent on users’ behaviors, the service providers are nevertheless the author of the records in the same way that the people writing biographies are the authors of those biographies rather than the subjects of those biographies. In addition, anonymization alleviates the privacy concerns. After identity-sensitive information is striped from the dataset, the record of each individual’s behavior becomes less likely to violate that individual’s privacy. The data is no longer about each of the individual users but rather about a set of users sharing some demographic characteristics. Users cannot be justified to argue for more when the data about them is perfectly anonymized.

But perfect anonymization is a high standard. Stripping away identifiable information is not always enough. For example, the TLC Trip Record Data provides publicly available information on the dates, times and locations of all taxi pick-ups/drop-offs in the New York City in a given year. Although the data does not include the identity of the passengers, it nevertheless increases the risks of privacy violations when it is used in conjunction with other publicly available information. Celebrity cab rides become easier to identify. People may simply find a photo of a celebrity getting into a cab and use the date, time, and location to find out where the celebrity was going. And in turn people may easily find out where the celebrity lives. The average Joe faces the same risks. An acquaintance may easily find out where you live after seeing you leaving in a cab after work. This tension between privacy and highly-accurate geolocation data has led to a proposal that the TLC should reveal only census tracts instead of the exact coordinates.1

Anonymized data is still capable of revealing key information about users’ identity when such data is used in combination of data from other sources. A link can be established between datasets when there are significant overlaps. Significant overlaps are not as difficult to achieve as one would expect. In the case of the TLC Trip Record Data, date, time and location provide the enough overlaps. After the link is established, the profile would be complete as long as one of the datasets contains the user’s identity.

A real privacy concern arises when multiple service providers give their purportedly anonymized records to one datacenter, where the complete profiles are being constructed. Datacenters like this already exist, although the profiles being constructed still lack in accuracy.2 The combination of datasets dilutes the responsibility of each service provider from whom the datasets are obtained. Despite their complicity in guilt, the service providers can easily point fingers to one another, making it hard for the users to find a particular target to blame. And if a user inadvertently consented to revealing his or her identity regarding one particular service provider in the first place. Then the blame can be shifted to that user.

A Way Out

Upon review, I realized that it may not be productive to argue for the right to control data about ourselves in order to seek changes in the practices of service providers. I cannot avoid making a choice between convenience and privacy. The calculus partly depends on exactly how much convenience I would have to sacrifice, so for the past week I set out to install a personal cloud in order to get a rough frame of reference.

After two hours of preliminary research, I learned the kind of hardware I need to purchase: a single-board computer such as Raspberry Pi plus accessories such as heat sinks and a USB keyboard at a total of $86.92. Once I obtained a new Raspberry Pi, I spent another two hours in research to learn how to install the images of the operating system and how to communicate with the equipment using my Windows laptop. At that point I was finally in the position to follow a guide online on installing personal cloud on Raspberry Pi, which took me another four hours to complete. Finally, in order to make my cloud accessible outside of my local network, I spent another two hours setting up port forwarding and dynamic DNS. In sum, I spent 10 hours of my time and $86.92 of my dime to set up a workable personal cloud with 32GB of storage. In return, I gained the consolation that I can now store personal files without any privacy worries, conditional on my setting up everything properly without leaving any loopholes that hackers could easily take advantage of. I most definitely have left some loopholes in one way or another, but at least the hackers or the government will have to work retail. You can access my personal cloud here.

Set ALLOWTOPICCHANGE = TWikiAdminGroup?

Navigation

Webs Webs

r1 - 05 Mar 2017 - 20:40:10 - LeonHuang
This site is powered by the TWiki collaboration platform.
All material on this collaboration platform is the property of the contributing authors.
All material marked as authored by Eben Moglen is available under the license terms CC-BY-SA version 4.
Syndicate this site RSSATOM