LeonHuangFirstPaper 6 - 30 Sep 2017 - Main.EbenMoglen
|
|
META TOPICPARENT | name="FirstPaper" |
| | -- By LeonHuang - 29 September 2017 | |
< < | Protecting our privacy relies on finding someone we can trust. Our quest for privacy began with a loss of trust in the service providers online like Google and Facebook to protect the users’ privacy. The quest continued with the revelation that (1) we have become addictive to the cornucopia of convenience provided by Google, Facebook, and the like and (2) we do not have the technical know-how to reinvent the wheel. The quest would reach its end when we can entrust someone else to do that for us. | > > | Protecting our privacy relies on finding someone we can trust. Our quest for privacy began with a loss of trust in the service providers online like Google and Facebook to protect the users’ privacy. The quest continued with the revelation that (1) we have become addictive to the cornucopia of convenience provided by Google, Facebook, and the like and (2) we do not have the technical know-how to reinvent the wheel. The quest would reach its end when we can entrust someone else to do that for us. | |
I was naïve to trust you, Google. | | According to Professor Moglen, I would have been better off had I consulted with experts. I would have spent $150 instead of $90 for a single-board computer much faster than a Rasberry Pi, and I would have installed Freedom Box software which would give me a personal cloud much more powerful. Taking his point further, I now believe my laughable attempt to reinvent the wheel in 10 hours is affront to the highly specific division of labor which we associate with the modern civilization. | |
> > |
I don't see how this follows. Your initial implementation of your
idea was not very effective. You learned that you needed faster
hardware. You implicitly discovered that you wanted something
different than you initially thought you wanted. (In any kind of
construction, including software architecture, the change orders
that result from learning what is wrong with the design while
building the design are both expensive and important.)
I suggested one way of solving the problem: with slightly more
expensive hardware and a shift to already-available software that
solves your problem, which I trust (1) because it is all free
software so everybody can see what it does by reading it, which you
can't do with the code running on the other side of someone's
service, as you remark; and (2) because the project making it is run
by me and my comrades, and we trust one another to follow the rules
of transparency and respecting users' rights, subject always to
complete ongoing inspection.
But we could also solve your problem at no hardware cost and using
only free software. We could use the storage that Columbia gives
your for free on your cunix account, and we could use tools like
"tomb" to create an encrypted container on that account and tools
like sshfs to access that container over a secure connection from
your laptop anywhere. The effect would be to give you a folder
contianing all your files organized however you want them that would
look like Google Drive does, but which would be secure at rest and
secure in flight and operated by you without any hardware or any
cost. The space is limited by what Columbia gives you, but we could
do the same thing using an Amazon S3 bucket---which you wouldn't
have to trust Amazon about because it is encrypted with keys they
don't have---and s3fs or the equivalent.
In other words, it's good to have some amount of knowledge, and
fairly illogical to say that you wanted to find out whether
something was possible to learn, but because you weren't successful
in learning it the first time you tried learning is impossible and
everyone should know that your experiment proves they should give
up.
| | Can I trust you?
If we cannot trust the established service providers or ourselves, the only way out is to seek help from other experts in the field. In my case, I could have asked Professor Moglen. And in a more generalized case, we would need to find someone who (A) has the necessary expertise, (B) has no or limited conflict of interest, and (C) cares enough about privacy to conduct due diligence. While someone who meets all three criteria can be hard to come by within the reach of one’s social circle, he or she is likely within reach over the internet. But how can I be certain when someone claims to meet all three criteria over the internet?
Building trust among strangers over the internet is as difficult as it is in the real world. When the apparent stakes are high, people are willing to go to extreme lengths in proving their genuine intentions. For example, the initiation of Zcash, a cryptocurrency, involved a lengthy ceremony simultaneously conducted by several participants across the globe while being video-recorded live from all angles.3 In the case of privacy protection, the stakes are less apparent and the consequences less direct. How can we make sure that a website purporting to provide secure cloud services is genuine? Privacy protection in the end resolves around this trust question.
\ No newline at end of file | |
> > |
Why does the answer to this question, which I taught and which we
discussed, not make an appearance here? When one uses free software
one does not have to "trust" people who tell you what software does:
one can read it. And if you don't or can't read it, you can listen
to the conversation among the millions of people who do make, use,
improve, and distribute free software, from companies like Oracle
and Red Hat in the S&P 500 to the technical workers in your social
circle. They teach people how to understand how it works, and are
constantly checking on its reliability. What you are posing as a
recursive difficulty in knowing whom to trust is actually a full
technical ecology of trust management that answers your objection so
basically and so strongly that most of the world's corporate IT
already depends on it.
| | \ No newline at end of file |
|
LeonHuangFirstPaper 5 - 30 Sep 2017 - Main.LeonHuang
|
|
META TOPICPARENT | name="FirstPaper" |
| |
< < | Do users have a right to control data about themselves? | > > | Privacy and Trust | | | |
< < | -- By LeonHuang - 05 Mar 2017 (Revised - 29 September 2017) | > > | -- By LeonHuang - 29 September 2017 | | Protecting our privacy relies on finding someone we can trust. Our quest for privacy began with a loss of trust in the service providers online like Google and Facebook to protect the users’ privacy. The quest continued with the revelation that (1) we have become addictive to the cornucopia of convenience provided by Google, Facebook, and the like and (2) we do not have the technical know-how to reinvent the wheel. The quest would reach its end when we can entrust someone else to do that for us. |
|
LeonHuangFirstPaper 4 - 29 Sep 2017 - Main.LeonHuang
|
|
META TOPICPARENT | name="FirstPaper" |
Do users have a right to control data about themselves? | |
< < | -- By LeonHuang - 05 Mar 2017 (Revised - 17 May 2017) | > > | -- By LeonHuang - 05 Mar 2017 (Revised - 29 September 2017) | | | |
< < | Service providers typically keep anonymized records of how users are using their services, and the users’ agreements typically require the users to agree to such practice. | > > | Protecting our privacy relies on finding someone we can trust. Our quest for privacy began with a loss of trust in the service providers online like Google and Facebook to protect the users’ privacy. The quest continued with the revelation that (1) we have become addictive to the cornucopia of convenience provided by Google, Facebook, and the like and (2) we do not have the technical know-how to reinvent the wheel. The quest would reach its end when we can entrust someone else to do that for us. | | | |
< < | Why do you say these
records are "anonymized"? That's the one thing they are pretty sure
not to be. | | | |
> > | I was naïve to trust you, Google. | | | |
< < | Users who care about both convenience and privacy may be motivated to argue that they should have a say over data about themselves. Is this position well-justified? | > > | Service providers online, like Google and Facebook, keep records of how users are using their services. When our initial ecstasy over free sign-ups subsides, we become worried about the threat to our privacy. Although the service providers, to the best of their conscientiousness, promise to strip away identity-sensitive information when they collect the data, we understand that data can never be perfectly anonymized, and our worries remain. | | | |
< < | What does
"well-justified" mean? Legally, ethically, logically? | > > | Data devoid of identifiable information may still threaten our privacy. For example, the TLC Trip Record Data provides publicly available information on the dates, times and locations of all taxi pick-ups/drop-offs in the New York City in a given year. Although the data does not include the identity of the passengers, it nevertheless increases the risks of privacy violations when it is used in conjunction with other publicly available information. Celebrity cab rides become easier to identify. People may simply find a photo of a celebrity getting into a cab and use the date, time, and location to find out where the celebrity was going. And in turn people may easily find out where the celebrity lives. The average Joe faces the same risks. An acquaintance may easily find out where you live after seeing you leaving in a cab after work. This tension between privacy and highly-accurate geolocation data has led to a proposal that the TLC should reveal only census tracts instead of the exact coordinates.1 | | | |
< < | Generally
speaking, you would want to begin an essay with an idea, not an
open-ended question. The reader needs to know why she should read
the essay. Obviously, if the reader is a philosopher who doesn't
care about time, whether or not to have an idea may be precisely the
stage at which she likes to begin reading. But for the remaining
fraction of humanity, you haven't provided a reason to read the
essay, and that means it will most likely not be read.
| > > | Anonymized data can reveal key information about users’ identity when such data is combined with data from other sources. A link can be established between datasets when there are significant overlaps, which are not as difficult to achieve as one would expect. In the case of the TLC Trip Record Data, date, time, and location provide enough overlaps to link a particular cab ride to the cab rider being photographed. In the case of purportedly anonymized records collected by service providers, there have already been efforts to combine such data from multiple sources to construct the complete profiles.2 | | | |
< < | Review of the Argument | > > | We cannot entrust our privacy to the service providers, so long as they keep on collecting our data. And the service providers will keep on collecting our data. It is key to the business model which has made them so valuable to their investors, both private and public. | | | |
< < | The position appeals to our intuition in two ways. First, the content of the records is produced by the users. Users may deem themselves as the author of such data and hence the owner of the data. Second, users expect to have control over their privacy. The records on users’ behaviors contain private information that users do not expect others to know, and therefore the users would think that they should have some say in how the records are being used. | | | |
< < | Once again the
unclarity about what sort of argument is being rehearsed makes
reading difficult. One sentence seems to be discussing a legal
concept, ownership, while the next appears to be stating a political
principle, though whether the principle is that things other people
don't know we should have some say in controlling, or whether we
have some stake in controlling things others know too is left vague.
| > > | You know nothing, John Doe. | | | |
< < | There are counterarguments too. The keeping of the records is performed by the service providers. Although the content of the records is largely dependent on users’ behaviors, the service providers are nevertheless the author of the records in the same way that the people writing biographies are the authors of those biographies rather than the subjects of those biographies. In addition, anonymization alleviates the privacy concerns. After identity-sensitive information is striped from the dataset, the record of each individual’s behavior becomes less likely to violate that individual’s privacy. The data is no longer about each of the individual users but rather about a set of users sharing some demographic characteristics. Users cannot be justified to argue for more when the data about them is perfectly anonymized.
Cannot, why? What is
"perfectly anonymized" data? Why is data always either about one
person or about nobody? If someone is keeping track of all Jews,
all Muslims, or all Tutsi, that's not about privacy because privacy
is only about one person at a time? Perhaps the definition of
"privacy" is at fault. But we don't know what that definition is
because you haven't given one.
But perfect anonymization is a high standard. Stripping away identifiable information is not always enough. For example, the TLC Trip Record Data provides publicly available information on the dates, times and locations of all taxi pick-ups/drop-offs in the New York City in a given year. Although the data does not include the identity of the passengers, it nevertheless increases the risks of privacy violations when it is used in conjunction with other publicly available information. Celebrity cab rides become easier to identify. People may simply find a photo of a celebrity getting into a cab and use the date, time, and location to find out where the celebrity was going. And in turn people may easily find out where the celebrity lives. The average Joe faces the same risks. An acquaintance may easily find out where you live after seeing you leaving in a cab after work. This tension between privacy and highly-accurate geolocation data has led to a proposal that the TLC should reveal only census tracts instead of the exact coordinates.1
Anonymized data is still capable of revealing key information about users’ identity when such data is used in combination of data from other sources. A link can be established between datasets when there are significant overlaps, which are not as difficult to achieve as one would expect. In the case of the TLC Trip Record Data, date, time, and location provide enough overlaps to link a particular cab ride to the cab rider being photographed. In the case of purportedly anonymized records collected by cloud service providers, there have already been efforts to combine such data from multiple sources to construct the complete profiles.2 The combination of datasets dilutes the responsibility of each service provider from whom the datasets are obtained. The more service providers complicit in guilt, the more diluted their individual responsibility is.
I don't understand the idea of diluted responsibility. Is that like when three polluters each put one poison into a river?
A Way Out?
Upon review, I realized that it may not be productive to argue for the right to control data about ourselves in order to seek changes in the practices of service providers. I cannot avoid making a choice between convenience and privacy. The calculus partly depends on exactly how much convenience I would have to sacrifice. In March I set out to install a personal cloud in order to obtain a rough frame of reference. | > > | We, as the regular John Does and Jane Does, do not have the technical know-how to provide for ourselves. A personal story can illustrate this point: I am addicted to the easy access to all my files on the go enabled by cloud services like Google Drive. In March, I set out to install a personal cloud that I hoped would do the same thing with anonymity. | | After two hours of preliminary research, I learned the kind of hardware I need to purchase: a single-board computer such as Raspberry Pi plus accessories such as an SD card for storage, at a total of $86.92. Once I obtained the equipment, I spent another two hours in research to learn how to install the images of the operating system and how to communicate with the equipment from my laptop. Then I spent four hours to install personal cloud software on the equipment, following a guide that I found online. Finally, in order to make my cloud accessible outside of my local network, I spent another two hours setting up port forwarding and dynamic DNS. In sum, I spent 10 hours of my time and $86.92 of my dime to set up a workable personal cloud with 32GB storage. In return, I gained the freedom to use cloud storage services without being forced to have my data collected by someone else. | | I cannot think of any files that I need remote access from time to time, that I am willing to tolerate the quirkiness of my personal cloud so as to prevent any service providers to harvest any data on me, and that are not sensitive enough for me to worry about targeted hacking. In the end, I do not know what to do with my personal cloud. I pulled the plug by the end of April. | |
> > | According to Professor Moglen, I would have been better off had I consulted with experts. I would have spent $150 instead of $90 for a single-board computer much faster than a Rasberry Pi, and I would have installed Freedom Box software which would give me a personal cloud much more powerful. Taking his point further, I now believe my laughable attempt to reinvent the wheel in 10 hours is affront to the highly specific division of labor which we associate with the modern civilization. | | | |
< < | | > > | Can I trust you? | | | |
< < | This anecdote doesn't seem to me related in any way to the previous
analysis. I do think that it would probably have been a good idea
to consult somebody else rather than trying to do your engineering
by the light of nature. By spending $150 instead of $90, you could
have had a single-board computer much faster than a RasberryPi? , and
if you had installed FreedomBox? software your personal cloud would
have done a great deal more for you than it appears you could figure
out yourself how to do, which isn't surprising considering that
dozens of experts have already been working on it for seven years
and you took a couple of hours.
But how did this help us to determine whether something or other was
"justified"? And what was the idea the essay wanted to communicate
to the reader? I think it was "privacy, whatever that is, isn't
very important even though we intuit at first that it might be. But
mostly it's just a lot of unnecessary trouble." Did I get that
right? | > > | If we cannot trust the established service providers or ourselves, the only way out is to seek help from other experts in the field. In my case, I could have asked Professor Moglen. And in a more generalized case, we would need to find someone who (A) has the necessary expertise, (B) has no or limited conflict of interest, and (C) cares enough about privacy to conduct due diligence. While someone who meets all three criteria can be hard to come by within the reach of one’s social circle, he or she is likely within reach over the internet. But how can I be certain when someone claims to meet all three criteria over the internet? | | | |
< < | | > > | Building trust among strangers over the internet is as difficult as it is in the real world. When the apparent stakes are high, people are willing to go to extreme lengths in proving their genuine intentions. For example, the initiation of Zcash, a cryptocurrency, involved a lengthy ceremony simultaneously conducted by several participants across the globe while being video-recorded live from all angles.3 In the case of privacy protection, the stakes are less apparent and the consequences less direct. How can we make sure that a website purporting to provide secure cloud services is genuine? Privacy protection in the end resolves around this trust question. | |
\ No newline at end of file |
|
LeonHuangFirstPaper 3 - 27 Sep 2017 - Main.EbenMoglen
|
|
META TOPICPARENT | name="FirstPaper" |
| | -- By LeonHuang - 05 Mar 2017 (Revised - 17 May 2017) | |
< < | Service providers typically keep anonymized records of how users are using their services, and the users’ agreements typically require the users to agree to such practice. Users who care about both convenience and privacy may be motivated to argue that they should have a say over data about themselves. Is this position well-justified? | > > | Service providers typically keep anonymized records of how users are using their services, and the users’ agreements typically require the users to agree to such practice.
Why do you say these
records are "anonymized"? That's the one thing they are pretty sure
not to be.
Users who care about both convenience and privacy may be motivated to argue that they should have a say over data about themselves. Is this position well-justified?
What does
"well-justified" mean? Legally, ethically, logically?
Generally
speaking, you would want to begin an essay with an idea, not an
open-ended question. The reader needs to know why she should read
the essay. Obviously, if the reader is a philosopher who doesn't
care about time, whether or not to have an idea may be precisely the
stage at which she likes to begin reading. But for the remaining
fraction of humanity, you haven't provided a reason to read the
essay, and that means it will most likely not be read.
| | Review of the Argument
The position appeals to our intuition in two ways. First, the content of the records is produced by the users. Users may deem themselves as the author of such data and hence the owner of the data. Second, users expect to have control over their privacy. The records on users’ behaviors contain private information that users do not expect others to know, and therefore the users would think that they should have some say in how the records are being used. | |
> > | Once again the
unclarity about what sort of argument is being rehearsed makes
reading difficult. One sentence seems to be discussing a legal
concept, ownership, while the next appears to be stating a political
principle, though whether the principle is that things other people
don't know we should have some say in controlling, or whether we
have some stake in controlling things others know too is left vague.
| | There are counterarguments too. The keeping of the records is performed by the service providers. Although the content of the records is largely dependent on users’ behaviors, the service providers are nevertheless the author of the records in the same way that the people writing biographies are the authors of those biographies rather than the subjects of those biographies. In addition, anonymization alleviates the privacy concerns. After identity-sensitive information is striped from the dataset, the record of each individual’s behavior becomes less likely to violate that individual’s privacy. The data is no longer about each of the individual users but rather about a set of users sharing some demographic characteristics. Users cannot be justified to argue for more when the data about them is perfectly anonymized. | |
> > | Cannot, why? What is
"perfectly anonymized" data? Why is data always either about one
person or about nobody? If someone is keeping track of all Jews,
all Muslims, or all Tutsi, that's not about privacy because privacy
is only about one person at a time? Perhaps the definition of
"privacy" is at fault. But we don't know what that definition is
because you haven't given one.
| | But perfect anonymization is a high standard. Stripping away identifiable information is not always enough. For example, the TLC Trip Record Data provides publicly available information on the dates, times and locations of all taxi pick-ups/drop-offs in the New York City in a given year. Although the data does not include the identity of the passengers, it nevertheless increases the risks of privacy violations when it is used in conjunction with other publicly available information. Celebrity cab rides become easier to identify. People may simply find a photo of a celebrity getting into a cab and use the date, time, and location to find out where the celebrity was going. And in turn people may easily find out where the celebrity lives. The average Joe faces the same risks. An acquaintance may easily find out where you live after seeing you leaving in a cab after work. This tension between privacy and highly-accurate geolocation data has led to a proposal that the TLC should reveal only census tracts instead of the exact coordinates.1
Anonymized data is still capable of revealing key information about users’ identity when such data is used in combination of data from other sources. A link can be established between datasets when there are significant overlaps, which are not as difficult to achieve as one would expect. In the case of the TLC Trip Record Data, date, time, and location provide enough overlaps to link a particular cab ride to the cab rider being photographed. In the case of purportedly anonymized records collected by cloud service providers, there have already been efforts to combine such data from multiple sources to construct the complete profiles.2 The combination of datasets dilutes the responsibility of each service provider from whom the datasets are obtained. The more service providers complicit in guilt, the more diluted their individual responsibility is. | |
> > |
I don't understand the idea of diluted responsibility. Is that like when three polluters each put one poison into a river?
| | A Way Out?
Upon review, I realized that it may not be productive to argue for the right to control data about ourselves in order to seek changes in the practices of service providers. I cannot avoid making a choice between convenience and privacy. The calculus partly depends on exactly how much convenience I would have to sacrifice. In March I set out to install a personal cloud in order to obtain a rough frame of reference. | | I cannot think of any files that I need remote access from time to time, that I am willing to tolerate the quirkiness of my personal cloud so as to prevent any service providers to harvest any data on me, and that are not sensitive enough for me to worry about targeted hacking. In the end, I do not know what to do with my personal cloud. I pulled the plug by the end of April. | |
< < | Set ALLOWTOPICCHANGE = TWikiAdminGroup? | | \ No newline at end of file | |
> > |
This anecdote doesn't seem to me related in any way to the previous
analysis. I do think that it would probably have been a good idea
to consult somebody else rather than trying to do your engineering
by the light of nature. By spending $150 instead of $90, you could
have had a single-board computer much faster than a RasberryPi? , and
if you had installed FreedomBox? software your personal cloud would
have done a great deal more for you than it appears you could figure
out yourself how to do, which isn't surprising considering that
dozens of experts have already been working on it for seven years
and you took a couple of hours.
But how did this help us to determine whether something or other was
"justified"? And what was the idea the essay wanted to communicate
to the reader? I think it was "privacy, whatever that is, isn't
very important even though we intuit at first that it might be. But
mostly it's just a lot of unnecessary trouble." Did I get that
right?
| | \ No newline at end of file |
|
LeonHuangFirstPaper 2 - 17 May 2017 - Main.LeonHuang
|
|
META TOPICPARENT | name="FirstPaper" |
Do users have a right to control data about themselves? | |
< < | -- By LeonHuang - 05 Mar 2017 | > > | -- By LeonHuang - 05 Mar 2017 (Revised - 17 May 2017) | | | |
< < | Service providers typically keep anonymized records of how users are using their services, and the users’ agreements, which the users have consented to, typically allow service providers to use these records. Users who care about both convenience and privacy may be motivated to argue that they should have a say over data about themselves. Is this position well-justified? | > > | Service providers typically keep anonymized records of how users are using their services, and the users’ agreements typically require the users to agree to such practice. Users who care about both convenience and privacy may be motivated to argue that they should have a say over data about themselves. Is this position well-justified? | | | |
< < | An Review of the Argument | > > | Review of the Argument | | | |
< < | The position appeals to our intuition in two ways. First, the content of the records is produced by the users. Users may deem themselves as the author of such data and in turn the owner of the data. Second, users expect to have control over their privacy. The records on users’ behaviors contain private information that users do not expect others to know, and therefore the users would think that they should have some say in how the records are being used. | > > | The position appeals to our intuition in two ways. First, the content of the records is produced by the users. Users may deem themselves as the author of such data and hence the owner of the data. Second, users expect to have control over their privacy. The records on users’ behaviors contain private information that users do not expect others to know, and therefore the users would think that they should have some say in how the records are being used. | | There are counterarguments too. The keeping of the records is performed by the service providers. Although the content of the records is largely dependent on users’ behaviors, the service providers are nevertheless the author of the records in the same way that the people writing biographies are the authors of those biographies rather than the subjects of those biographies. In addition, anonymization alleviates the privacy concerns. After identity-sensitive information is striped from the dataset, the record of each individual’s behavior becomes less likely to violate that individual’s privacy. The data is no longer about each of the individual users but rather about a set of users sharing some demographic characteristics. Users cannot be justified to argue for more when the data about them is perfectly anonymized.
But perfect anonymization is a high standard. Stripping away identifiable information is not always enough. For example, the TLC Trip Record Data provides publicly available information on the dates, times and locations of all taxi pick-ups/drop-offs in the New York City in a given year. Although the data does not include the identity of the passengers, it nevertheless increases the risks of privacy violations when it is used in conjunction with other publicly available information. Celebrity cab rides become easier to identify. People may simply find a photo of a celebrity getting into a cab and use the date, time, and location to find out where the celebrity was going. And in turn people may easily find out where the celebrity lives. The average Joe faces the same risks. An acquaintance may easily find out where you live after seeing you leaving in a cab after work. This tension between privacy and highly-accurate geolocation data has led to a proposal that the TLC should reveal only census tracts instead of the exact coordinates.1 | |
< < | Anonymized data is still capable of revealing key information about users’ identity when such data is used in combination of data from other sources. A link can be established between datasets when there are significant overlaps. Significant overlaps are not as difficult to achieve as one would expect. In the case of the TLC Trip Record Data, date, time and location provide the enough overlaps. After the link is established, the profile would be complete as long as one of the datasets contains the user’s identity. | > > | Anonymized data is still capable of revealing key information about users’ identity when such data is used in combination of data from other sources. A link can be established between datasets when there are significant overlaps, which are not as difficult to achieve as one would expect. In the case of the TLC Trip Record Data, date, time, and location provide enough overlaps to link a particular cab ride to the cab rider being photographed. In the case of purportedly anonymized records collected by cloud service providers, there have already been efforts to combine such data from multiple sources to construct the complete profiles.2 The combination of datasets dilutes the responsibility of each service provider from whom the datasets are obtained. The more service providers complicit in guilt, the more diluted their individual responsibility is. | | | |
< < | A real privacy concern arises when multiple service providers give their purportedly anonymized records to one datacenter, where the complete profiles are being constructed. Datacenters like this already exist, although the profiles being constructed still lack in accuracy.2 The combination of datasets dilutes the responsibility of each service provider from whom the datasets are obtained. Despite their complicity in guilt, the service providers can easily point fingers to one another, making it hard for the users to find a particular target to blame. And if a user inadvertently consented to revealing his or her identity regarding one particular service provider in the first place. Then the blame can be shifted to that user. | > > | A Way Out? | | | |
< < | A Way Out | > > | Upon review, I realized that it may not be productive to argue for the right to control data about ourselves in order to seek changes in the practices of service providers. I cannot avoid making a choice between convenience and privacy. The calculus partly depends on exactly how much convenience I would have to sacrifice. In March I set out to install a personal cloud in order to obtain a rough frame of reference. | | | |
< < | Upon review, I realized that it may not be productive to argue for the right to control data about ourselves in order to seek changes in the practices of service providers. I cannot avoid making a choice between convenience and privacy. The calculus partly depends on exactly how much convenience I would have to sacrifice, so for the past week I set out to install a personal cloud in order to get a rough frame of reference. | > > | After two hours of preliminary research, I learned the kind of hardware I need to purchase: a single-board computer such as Raspberry Pi plus accessories such as an SD card for storage, at a total of $86.92. Once I obtained the equipment, I spent another two hours in research to learn how to install the images of the operating system and how to communicate with the equipment from my laptop. Then I spent four hours to install personal cloud software on the equipment, following a guide that I found online. Finally, in order to make my cloud accessible outside of my local network, I spent another two hours setting up port forwarding and dynamic DNS. In sum, I spent 10 hours of my time and $86.92 of my dime to set up a workable personal cloud with 32GB storage. In return, I gained the freedom to use cloud storage services without being forced to have my data collected by someone else. | | | |
< < | After two hours of preliminary research, I learned the kind of hardware I need to purchase: a single-board computer such as Raspberry Pi plus accessories such as heat sinks and a USB keyboard at a total of $86.92. Once I obtained a new Raspberry Pi, I spent another two hours in research to learn how to install the images of the operating system and how to communicate with the equipment using my Windows laptop. At that point I was finally in the position to follow a guide online on installing personal cloud on Raspberry Pi, which took me another four hours to complete. Finally, in order to make my cloud accessible outside of my local network, I spent another two hours setting up port forwarding and dynamic DNS. In sum, I spent 10 hours of my time and $86.92 of my dime to set up a workable personal cloud with 32GB of storage. In return, I gained the consolation that I can now store personal files without any privacy worries, conditional on my setting up everything properly without leaving any loopholes that hackers could easily take advantage of. I most definitely have left some loopholes in one way or another, but at least the hackers or the government will have to work retail. You can access my personal cloud here. | > > | It turns out the tradeoff is not limited to the initial set up cost. My personal cloud is excruciatingly slow compared to the established cloud services. It sometimes stops responding until I reboot the equipment, making it rather unreliable as a service intended for remote access. And I still cannot trust its security, because I know it is set up and maintained by an amateur with little knowledge about network security and little time to even keep its operating software up-to-date.
I cannot think of any files that I need remote access from time to time, that I am willing to tolerate the quirkiness of my personal cloud so as to prevent any service providers to harvest any data on me, and that are not sensitive enough for me to worry about targeted hacking. In the end, I do not know what to do with my personal cloud. I pulled the plug by the end of April. | |
Set ALLOWTOPICCHANGE = TWikiAdminGroup? |
|
LeonHuangFirstPaper 1 - 05 Mar 2017 - Main.LeonHuang
|
|
> > |
META TOPICPARENT | name="FirstPaper" |
Do users have a right to control data about themselves?
-- By LeonHuang - 05 Mar 2017
Service providers typically keep anonymized records of how users are using their services, and the users’ agreements, which the users have consented to, typically allow service providers to use these records. Users who care about both convenience and privacy may be motivated to argue that they should have a say over data about themselves. Is this position well-justified?
An Review of the Argument
The position appeals to our intuition in two ways. First, the content of the records is produced by the users. Users may deem themselves as the author of such data and in turn the owner of the data. Second, users expect to have control over their privacy. The records on users’ behaviors contain private information that users do not expect others to know, and therefore the users would think that they should have some say in how the records are being used.
There are counterarguments too. The keeping of the records is performed by the service providers. Although the content of the records is largely dependent on users’ behaviors, the service providers are nevertheless the author of the records in the same way that the people writing biographies are the authors of those biographies rather than the subjects of those biographies. In addition, anonymization alleviates the privacy concerns. After identity-sensitive information is striped from the dataset, the record of each individual’s behavior becomes less likely to violate that individual’s privacy. The data is no longer about each of the individual users but rather about a set of users sharing some demographic characteristics. Users cannot be justified to argue for more when the data about them is perfectly anonymized.
But perfect anonymization is a high standard. Stripping away identifiable information is not always enough. For example, the TLC Trip Record Data provides publicly available information on the dates, times and locations of all taxi pick-ups/drop-offs in the New York City in a given year. Although the data does not include the identity of the passengers, it nevertheless increases the risks of privacy violations when it is used in conjunction with other publicly available information. Celebrity cab rides become easier to identify. People may simply find a photo of a celebrity getting into a cab and use the date, time, and location to find out where the celebrity was going. And in turn people may easily find out where the celebrity lives. The average Joe faces the same risks. An acquaintance may easily find out where you live after seeing you leaving in a cab after work. This tension between privacy and highly-accurate geolocation data has led to a proposal that the TLC should reveal only census tracts instead of the exact coordinates.1
Anonymized data is still capable of revealing key information about users’ identity when such data is used in combination of data from other sources. A link can be established between datasets when there are significant overlaps. Significant overlaps are not as difficult to achieve as one would expect. In the case of the TLC Trip Record Data, date, time and location provide the enough overlaps. After the link is established, the profile would be complete as long as one of the datasets contains the user’s identity.
A real privacy concern arises when multiple service providers give their purportedly anonymized records to one datacenter, where the complete profiles are being constructed. Datacenters like this already exist, although the profiles being constructed still lack in accuracy.2 The combination of datasets dilutes the responsibility of each service provider from whom the datasets are obtained. Despite their complicity in guilt, the service providers can easily point fingers to one another, making it hard for the users to find a particular target to blame. And if a user inadvertently consented to revealing his or her identity regarding one particular service provider in the first place. Then the blame can be shifted to that user.
A Way Out
Upon review, I realized that it may not be productive to argue for the right to control data about ourselves in order to seek changes in the practices of service providers. I cannot avoid making a choice between convenience and privacy. The calculus partly depends on exactly how much convenience I would have to sacrifice, so for the past week I set out to install a personal cloud in order to get a rough frame of reference.
After two hours of preliminary research, I learned the kind of hardware I need to purchase: a single-board computer such as Raspberry Pi plus accessories such as heat sinks and a USB keyboard at a total of $86.92. Once I obtained a new Raspberry Pi, I spent another two hours in research to learn how to install the images of the operating system and how to communicate with the equipment using my Windows laptop. At that point I was finally in the position to follow a guide online on installing personal cloud on Raspberry Pi, which took me another four hours to complete. Finally, in order to make my cloud accessible outside of my local network, I spent another two hours setting up port forwarding and dynamic DNS. In sum, I spent 10 hours of my time and $86.92 of my dime to set up a workable personal cloud with 32GB of storage. In return, I gained the consolation that I can now store personal files without any privacy worries, conditional on my setting up everything properly without leaving any loopholes that hackers could easily take advantage of. I most definitely have left some loopholes in one way or another, but at least the hackers or the government will have to work retail. You can access my personal cloud here.
Set ALLOWTOPICCHANGE = TWikiAdminGroup? |
|
|