Reconciling AI ‘hallucinations’ with GDPR compliance
Posted: April 29, 2025
Generative AI is promoted as world-changing by many of its advocates and decreed as harmful by its critics. But while the technology is undeniably useful in many contexts, it’s also hard to reconcile certain aspects of generative AI with data protection and privacy law compliance.
Here’s a look at one of the key friction points where generative AI meets compliance with the EU and UK General Data Protection Regulation (GDPR)—the software’s tendency to “hallucinate”.
We’ll consider how generative AI hallucinations present a challenge for GDPR compliance and how organizations can mitigate the associated risks.
Understanding generative AI hallucinations
A “hallucination” occurs when generative AI generates confident but false, fabricated, or nonsensical information not grounded in its training data.
Hallucinations result from a mix of factors, including:
- The probabilistic nature of Large Language Models (LLMs)
- Limitations in training data
- A lack of real-world grounding
- Objective conflicts during training.
Generative AI outputs are becoming more consistent and accurate, but hallucinations remain a problem for companies integrating these models into their workflows.
AI firm HuggingFace keeps a roster of hallucination rates among generative AI LLMs. As of April 2025:
- Google’s Gemini 2.0 Flash model exhibits the highest “factual consistency” rate according to the company’s testing, with 0.7% of responses deemed hallucinations.
- French firm Mistral’s 7B Instruct v0.3 model ranks lowest, with an alleged hallucination rate of 9.5%.
- The best effort of industry giant OpenAI appears to be its o3 mini “high reasoning” model, which ranks third with a hallucination rate of 0.8%.
But even relatively low hallucination rates can cause significant issues for businesses and individuals, including in the context of GDPR compliance.
How AI hallucinations can violate the GDPR
Here are just a few of the many ways in which the use of generative AI could conflict with parts of the GDPR.
- The accuracy principle, Article 5(1)(d): The GDPR mandates that personal data must be accurate and kept up to date. If an AI hallucinates information about an identifiable individual (even if fabricated), it arguably constitutes processing inaccurate personal data.
- The “fairness” principle, Article 5(1)(a): Generating false information about individuals can lead to unfair outcomes, and in some contexts is inherently unfair.
- The right to rectification, Article 16: Facilitating people’s right to correct inaccurate or outdated personal data is particularly challenging when using generative AI.
- The right to erasure, Article 17: While people have the right to delete inaccurate personal data, there are technical challenges to facilitating this process in the context of LLMs.
- The right of access, Article 15: When asked what personal data it processes about a person, an LLM will typically produce some inaccurate data. However, it is unclear whether such information—having been generated on request—is within the scope of the right of access.
- Data protection by design and by default, Article 25: The GDPR requires controllers and processors to implement high standards of data protection at every stage of data processing. But companies using an “off the shelf” generative AI model may struggle to meet this requirement if the model hallucinates.
There are ongoing legal debates about the scope of the GDPR and the extent of organizations’ obligations in some of these areas—but there are steps businesses can take to minimize the risk of using generative AI.
Generative AI hallucination GDPR risk mitigation strategies
Along with emerging technical solutions, such as Retrieval-Augmented Generation (RAG) and machine unlearning, here are some steps all companies should consider to mitigate the risks from AI hallucinations.
- Human-in-the-loop: Including manual checks on all AI-generated outputs is crucial, especially in sensitive contexts such as HR, legal, and health.
- Training: Ensure all staff using generative AI have a good level of AI literacy and are aware of the limitations of the technology.
- Clear policies: Put policies in place to explain where and how employees may use generative AI.
- Transparency: Where appropriate, include disclaimers alongside output explaining that they may include inaccuracies.
- Data minimization: The GDPR only applies when processing personal data, so apply the principle of data minimization to limit exposure.
- Testing and validation: Test AI models for their rate of hallucination in relevant scenarios. Obtain assurances from any third-party AI providers and processors.
- Data Protection Impact Assessments (DPIAs): Generative AI is still considered a novel technology by data protection regulators, so conduct a DPIA before using it to process personal data.
Generative AI can be extremely useful, but using this technology requires careful consideration of people’s data protection rights and the inherent risks to your organization.
Managing consent and privacy in the age of AI
As organizations explore innovative applications of AI, privacy teams must proactively address challenges without hindering progress.
Discover more in our guide, which covers:
- The importance of robust Consent Management in AI
- Developing a scalable Consent Management Platform
- Implementing Consent Management in AI projects