Ethics + Sustainability = Responsible AI

Community Article Published October 9, 2025

TL;DR What's the point of an "ethical" AI that burns through resources, or a "green" AI that amplifies bias? Ethics and sustainability have to evolve together, starting with how we measure and disclose AI's real costs.

Introduction

Recent years have seen more research dedicated to evaluating AI’s impacts on society and the environment. Our own team at Hugging Face has led a lot of this research, working on topics ranging from consent and bias to quantifying the energy demands of video generation models. While all of this work is important and continues to deepen our understanding of AI’s impacts on the world that we live in, most of it remains siloed in terms of addressing AI’s sustainability and ethics separately.

For instance, carbon footprint analyses of AI models typically do not consider how the pursuit of scale has contributed towards building models that are both inaccessible to most researchers in terms of cost and disproportionately harmful to the environment, whereas evaluations of model performance mainly fail to engage with the environmental ramifications of AI models and how these fit into their auditing approaches.

In our recent article on this subject, we argue that by addressing these two sets of issues separately, both sides are missing key transversal connections, as well as the potential solutions for making informed choices. In the sections below, we will hone in on two specific topics that are particularly relevant for both ethics and sustainability within the open-source community: evaluation and transparency, and how they can be operationalized, highlighting several of our projects that contribute to these topics. We’ll conclude with a proposal of best practices to better integrate AI ethics and sustainability in AI research and governance.

Transversal issues in ethics and sustainability

In our initial paper, we identified four transversal themes that cut across ethics and sustainability and were relevant to both: generalizability, evaluation, transparency, and power. In this blog post, we focus on two of the most pressing ones: evaluation and transparency, since these are the most relevant ones to the open-source community.

Evaluation πŸ“

A popular adage states that β€œyou can’t improve what you don’t measure” – for AI systems, this means that the criteria that we use to compare different systems and the way in which this evaluation is carried out are crucial – making some aspects of systems’ performance indispensable, while others are overlooked. Popular approaches for evaluating AI systems like LM Arena and the MTEB Leaderboard typically only measure metrics like accuracy or precision, which do allow comparing the performance of systems, but do not reflect other factors like efficiency or bias.

However, the truth of the matter is that in the vast majority of real-world deployment contexts, multiple measures, from monetary cost to toxicity, are considered in parallel to performance. This is especially the case for large language models (LLMs), which don’t have a single well-established evaluation approach. In this case, different evaluation approaches are adopted, from red-teaming to external audits as well as leaderboards such as the Open LLM Leaderboard, which go above and beyond performance criteria to also consider efficiency and even carbon emissions.

However, the lack of a standardized methodology can make it hard to compare the metrics reported by different organizations. We recently illustrated this point in a blog post about the environmental impact disclosures made by prominent companies developing AI systems – since each organization uses their own methodology and their own set of evaluation criteria, the resulting numbers cannot be compared by users or developers.

This lack of comparability was also the rationale for the AI Energy Score project, which adopts a standardized methodology to compare the energy consumption of AI models across ten different tasks, allowing developers and users to choose between different models based on their relative energy efficiency compared to other models in their category:

image

While the AI Energy Score project is well-positioned to allow for meaningful comparisons of AI inference, it overlooks the rest of the model lifecycle - from the embodied emissions of GPU manufacturing to the energy consumption of data generation and model training. Some of our previous work has proposed ways to incorporate a broader perspective of AI’s environmental impacts into model evaluation – from a Life Cycle Analysis approach to a consideration of the rebound effects incurred by the efficiency gains that were made.

We are still missing a more formal framework for evaluating and comparing different AI systems - which would encompass different categories of environmental costs as well as broader ethical and societal impacts - but the key principle that enables this kind of evaluation is transparency, which we discuss below.

Transparency πŸ”Ž

One of the fundamental principles of scientific practice, transparency is a key component of research and inquiry, but operationalizing it in the context of modern AI is challenging. On the one hand, modern AI models are not inherently transparent given the complexity of their architectures, making it difficult to make meaningful conclusions about the way in which they operate. On the other hand, given the increasingly blurred line between AI research and practice, many of the most popular AI systems are proprietary and therefore subject to trade secrets regarding the specific details of their architecture and deployment - making it difficult to compare them.

In terms of sustainability, transparency has lagged even further behind. Environmental impacts are still rarely reported, and most carbon footprint estimates are reconstructed post-hoc by independent researchers rather than disclosed by model creators. To address this gap, we developed the Environmental Transparency Space, an open platform that standardizes and visualizes carbon reporting for AI models across different years and organizations. By making environmental data openly available, it allows anyone to understand and compare the ecological costs of AI systems, and to see how transparency has evolved over time.

image

Artifacts such as data sheets and model cards have become important tools to bridge that gap, offering essential information about how an AI system works β€” and, crucially, where it doesn’t. They help document a model’s training data, intended use, limitations, potential biases, and environmental footprint. These transparency tools make AI systems more understandable and accountable, but they can only go so far without access to deeper information. While accuracy-based benchmarks often rely on simple API queries, evaluation requires richer disclosure, including details about training data, compute usage, and, where possible, access to model weights. As a community, we therefore need to develop and adopt best practices, both in the context of research and governance, to guide AI towards increased transparency – we discuss these below.

Best Practices πŸ“–

Research πŸ§‘β€πŸ”¬

When it comes to research, integrating ethics and sustainability means looking at AI systems as part of a broader socio-technical and ecological network. Models do not exist in isolation: they depend on data, energy, and infrastructures, and they affect communities and environments. Therefore, to make responsible progress, research should move beyond narrow technical benchmarks and engage with broader questions that span ethics and sustainability.

Changing the way in which we evaluate AI systems. In reality, traditional machine learning metrics such as accuracy or efficiency tell us little about whether an AI system is just, inclusive, or environmentally responsible. A model that performs well on a benchmark might still amplify existing inequalities or increase energy consumption elsewhere. Moreover, efficiency gains can produce unintended rebound effects: when technology becomes cheaper or faster, it is often used more, offsetting any sustainability gains. We therefore argue for holistic evaluations that assess both social and environmental impacts, including long-term consequences that short-term performance metrics may not capture.

Operationalizing transparency and standardization is essential. Research should clarify how AI systems work, where their data comes from, who stands to benefit, and who might bear the costs. Transparency in this sense is what allows scrutiny, reproducibility, and accountability. By embedding social and environmental context into research design and reporting, we strengthen scientific inquiry’s reliability and ethical integrity.

Governance βš–οΈ

As AI systems are increasingly scrutinized by policymakers and governments at different levels, it has become paramount to develop new governance mechanisms for operationalizing best practices in terms of system evaluation and transparency. This will require interdisciplinary approaches, incorporating efforts from different domains depending on the context of deployment – we propose some of these approaches below.

Adapting compliance mechanisms to include environmental and ethical assessments: new and existing audits of AI systems can be extended to include assessments of both ethical and environmental impacts, such as bias and energy consumption and carbon emissions. Requiring these types of audits before the deployment of systems in practice, especially in high-stakes contexts such as education and healthcare, but also in contexts such as disaster prediction and climate modeling, that come with potentially widespread environmental impacts.

Developing new governance approaches: attempting to evaluate the wider rebound effects of AI tools and their impacts on consumption and human behavior will require developing new methods and approaches. These can be inspired by domains like economics and environmental impact assessment and help bridge the gap between the technical assessments of AI systems and their real-life consequences on people and the planet.

Enforcing standardized measurement approaches: while trade secret can be a valid reason for not sharing the absolute measurements regarding AI system performance (e.g. total energy consumed, or the exact contents of training data), it is possible to use approaches like the AI Energy Score framework to run standardized tests and provide relative comparisons of results. By ensuring that the processes and results of these practices are well-documented and publicly accessible, this can contribute to what can be termed "usable transparency", encouraging developers to prioritize ethical considerations and sustainability.

Conclusion

AI systems are mirrors of our priorities. If we separate ethics from sustainability, we build technologies that are efficient but unjust, or fair but unsustainable. Integrating the two becomes a necessity for any technology meant to endure. The measure of AI’s success should be not only what it can do, but what it preserves. In the current post, we advocate for a socio-technical approach to integrating AI ethics and sustainability, based on both fundamental principles as well as best practices, which we hope will be adopted by the AI community at large.

Community

I think it would do you all good to check out my quantum_veritas_engine

Sign up or log in to comment