When Explainability Meets Privacy: An Investigation at the Intersection of Post-hoc Explainability and Differential Privacy in the Context of Natural Language Processing
Abstract
The study investigates the relationship between privacy and explainability in NLP, using Differential Privacy and Post-hoc Explainability methods, and provides recommendations for balancing both.
In the study of trustworthy Natural Language Processing (NLP), a number of important research fields have emerged, including that of explainability and privacy. While research interest in both explainable and privacy-preserving NLP has increased considerably in recent years, there remains a lack of investigation at the intersection of the two. This leaves a considerable gap in understanding of whether achieving both explainability and privacy is possible, or whether the two are at odds with each other. In this work, we conduct an empirical investigation into the privacy-explainability trade-off in the context of NLP, guided by the popular overarching methods of Differential Privacy (DP) and Post-hoc Explainability. Our findings include a view into the intricate relationship between privacy and explainability, which is formed by a number of factors, including the nature of the downstream task and choice of the text privatization and explainability method. In this, we highlight the potential for privacy and explainability to co-exist, and we summarize our findings in a collection of practical recommendations for future work at this important intersection.
Community
Accepted to AAAI/ACM Conference on AI, Ethics, and Society (AIES 2025)
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Towards integration of Privacy Enhancing Technologies in Explainable Artificial Intelligence (2025)
- Privacy-Aware Decoding: Mitigating Privacy Leakage of Large Language Models in Retrieval-Augmented Generation (2025)
- SoK: Semantic Privacy in Large Language Models (2025)
- DP-Fusion: Token-Level Differentially Private Inference for Large Language Models (2025)
- InvisibleInk: High-Utility and Low-Cost Text Generation with Differential Privacy (2025)
- PrivacyXray: Detecting Privacy Breaches in LLMs through Semantic Consistency and Probability Certainty (2025)
- Can LLM-Generated Textual Explanations Enhance Model Classification Performance? An Empirical Study (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
 You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: 
@librarian-bot
	 recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
