- Item 1
- Item 2
- Item 3
- Item 4
"Next to Impossible": OpenAI's ChatGPT Faces GDPR Compliance Woes
Amid a temporary ban in Italy, OpenAI's ChatGPT confronts the difficult task of achieving GDPR compliance, with European legal experts deeming the prospect of adhering to the regulations "next to impossible."
OpenAI faces numerous challenges in complying with the EU's GDPR guidelines. Photo illustration: Artisana
🧠Stay Ahead of the Curve
Italy’s temporary ban on OpenAI's ChatGPT leaves OpenAI with a tight deadline to comply with its requests.
However, European legal experts predict OpenAI's compliance with GDPR regulations could be "next to impossible."
The ChatGPT ban highlights broader concerns about data collection practices and the need for AI companies to prioritize data privacy, as regulations rapidly develop around AI technology.
April 20, 2023
OpenAI's ChatGPT, after its temporary ban by Italy, now has less than two weeks to implement corrective measures. However, European legal experts predict it may be "next to impossible" for OpenAI to comply with Italy's regulations and the broader GDPR requirements. Failure to comply may result in severe consequences, from financial penalties to an outright ban of ChatGPT.
AI Model Construction Under Scrutiny
At the heart of the matter is OpenAI's methodology for building their AI models. AI models require vast quantities of data, much of which is publicly scraped and collected without user consent. OpenAI's GPT-2 model utilized 40 GB of text, while GPT-3 used 570 GB. OpenAI has refused to disclose the data used for GPT-4, frustrating researchers.
Italy's data regulator banned ChatGPT on the grounds that it breached GDPR regulations, stating there "appears to be no legal basis underpinning the massive collection and processing of personal data" used to train the algorithms. Italy's decision sparked similar investigations in France, Germany, Ireland, and Canada, prompting the EU's Data Protection Board to establish a task force for coordination and enforcement regarding ChatGPT.
Corrective Measures Demanded by Italian Data Regulator
OpenAI has been asked to implement several corrective measures, including:
Obtaining consent from individuals to scrape their data or proving "legitimate interest" in data collection
Explaining to users how ChatGPT utilizes their data
Allowing users to correct inaccuracies about them produced by the chatbot
Enabling users to request data erasure
Offering users the option to revoke consent for ChatGPT to use their data
Experts believe OpenAI’s data scraping as the most contentious compliance issue. OpenAI is unlikely to prove consent for the data used to train its AI models. But the "legitimate interest" test also poses a challenge, necessitating that companies offer rigorous reasons to justify using or retaining data without consent. The EU data regulator cites scenarios such as fraud prevention, network security, and crime prevention as valid reasons.
Margaret Mitchell, an AI researcher and ethics lead at Hugging Face, asserts that "OpenAI is going to find it near-impossible to identify individuals' data and remove it from its models." Mitchell previously served as Google's AI ethics co-lead.
Messy Data Collection is an AI Industry-Wide Problem
Historically, AI companies have viewed data collection as a means to an end, often neglecting accuracy and labeling. To gather the massive amounts of data needed to train their models, AI companies purchase bulk data from providers, use indiscriminate scrapers, and depend on contractors for basic filtering and error checking.
The Washington Post reported that many technology companies remain unaware of the contents of their training datasets. Even Google's heavily filtered Colossal Clean Crawled Corpus (C4) dataset, used for training various AI models, was found to contain content from white supremacist site Stormfront and unregulated online forum 4chan.
Google researcher Nithya Sambasivan concluded in a study that data practices are "messy, protracted, and opaque." In the end, Sambasivan noted these challenges arise because "everyone wants to do the model work, not the data work."
Research
In Largest-Ever Turing Test, 1.5 Million Humans Guess Little Better Than ChanceJune 09, 2023
News
Leaked Google Memo Claiming “We Have No Moat, and Neither Does OpenAI” Shakes the AI WorldMay 05, 2023
Research
GPT AI Enables Scientists to Passively Decode Thoughts in Groundbreaking StudyMay 01, 2023
Research
GPT-4 Outperforms Elite Crowdworkers, Saving Researchers $500,000 and 20,000 hoursApril 11, 2023
Research
Generative Agents: Stanford's Groundbreaking AI Study Simulates Authentic Human BehaviorApril 10, 2023
Culture
As Online Users Increasingly Jailbreak ChatGPT in Creative Ways, Risks Abound for OpenAIMarch 27, 2023