Law in the Internet Society

Artificial Intelligence Is Grabbing Your Data Secretly

-- By ChengyuanZhou? - 24 Oct 2024

Introduction

Your own data is now legally being used by Internet giants for AI training. In the new wave of AI competition, due to the large amount of data required to train models, when the ordinary database of the Internet is exhausted, the large amount of user personal content on the platform becomes very attractive to the companies. Some tech giants are starting to sneak AI training into their data use policies, giving themselves the right to use people's data. Major companies such as Google, Meta, Adobe, Zoom, and X have updated their terms of service or privacy policies to allow them to use user data to train generative AI models.

Modify User Agreement

Last July, Google made changes to its privacy policy to add that public information can be used to train its AI chatbots and other services. A Google spokesperson also said that with the permission of a small group of users, Google is allowed to use their personal emails to train its AI in some ways. According to Meta's updated privacy policy, the company now explicitly states that user information like posts, photos, and even interactions with their AI tools on Facebook, Instagram, and Messenger can be used to train their AI models. A Meta spokesperson said in a statement that using publicly available information to train AI models is common practice across the industry and is not unique to our service. Adobe, Zoom and Snap also have added such clauses in their user agreement these years. To avoid user aversion over privacy concerns, companies sometimes make these changes quietly. In many cases, users will click agree without reading a word, accepting the agreement without warning.

AI Training Is Using Your Sensitive Personal Information

The training data of AI mainly comes from all publicly available data on the Internet, including news reports, social media text, and articles published by platform users. In this process, a large amount of user information is collected by technology companies without the consent of the information publisher. At the same time, AI will also collect data generated by users in the process of using the model, such as conversation content, search history and browsing information, to optimize its operation. The collection and processing of this information enables AI to continuously learn user preferences and interests. In this process, users may unconsciously disclose their private information, such as real name, email address, residence, etc. The disclosure and abuse of these private information may bring significant losses to users, because their behaviors and decisions are more predictable by tech companies or the government.

Defense Made by Supporters of AI Training

A common defense is that personal information captured by AI will only be used for training purposes and not for other commercial purposes. Under lots of data privacy laws, a business that does not directly collect personal information from consumers and does not control the collection process is typically not required to provide collection notices, especially if they do not sell or share any personal information they may receive from other sources. Companies may argue that they only scrape data from the web to train the AI, such as Information posted by users on social media, which does not belong to personal information. However, even publicly available personal data is still protected by data privacy laws. When users provide information to the public on social media or the Web, they do not want others to take and make use of that information and process it without their explicit consent. Companies' practice of data scraping for the purpose of training AI could undermine consumer trust, which could have adverse impact on digital economy if fewer and fewer people are willing to sharing their experience or opinions on the internet. However, it is hard for these concerns to influence tech companies' decisions. The huge economic benefits and the market size of AI make no technology giant willing to give up training on AI tools, even if this will bring them huge moral and legal risks. Technological competition and market isolation among countries such as China have also created a huge demand for training AI without considering the invasion of personal privacy.

User Consent: A Way to Deter Data Scraping for Training AI

This August the X platform has been accused of grabbing data from EU users without their consent to train AI models. This practice violates The General Data Protection Regulation regulating that any use of personal data needs to be made on a legal basis. Meta argues that it is using content that people have chosen to make public to build its foundational. However, any company that interacts directly with user data should clearly inform users and give them the option before use, which is an integral part of protecting user privacy. Requiring explicit consent from the user is a desirable solution, but it is clearly not enough. Even with user consent, AI training should be prohibited from grabbing sensitive private information, such as emails and call records. In addition, AI training must not retroactively use personal information prior to the user's consent. Once a company has obtained user consent, it should clearly indicate what information will be used for AI training to provide better guidance and clarification to users. It is understandable that these rules hardly really stop companies from collecting personal information. In the emerging market of AI, the boundary between technology needs and privacy protection is far from settled, and this battle will not stop for a long time.


You are entitled to restrict access to your paper if you want to. But we all derive immense benefit from reading one another's work, and I hope you won't feel the need unless the subject matter is personal and its disclosure would be harmful or undesirable. To restrict access to your paper simply delete the "#" character on the next two lines:

Note: TWiki has strict formatting rules for preference declarations. Make sure you preserve the three spaces, asterisk, and extra space at the beginning of these lines. If you wish to give access to any other users simply add them to the comma separated ALLOWTOPICVIEW list.

Navigation

Webs Webs

r3 - 25 Dec 2024 - 11:45:08 - ChengyuanZhou
This site is powered by the TWiki collaboration platform.
All material on this collaboration platform is the property of the contributing authors.
All material marked as authored by Eben Moglen is available under the license terms CC-BY-SA version 4.
Syndicate this site RSSATOM