Granite Guardian Library
The Guardian Library includes four capabilities implemented as LoRA adapters for ibm-granite/granite-4.0-micro. Each capability has been developed for a specific task related to safety, factuality, and policy compliance in LLM-based systems. We give a brief overview of the functionality of each capability, as the details can be found in each README.
Capabilities implemented as LoRA adapters
The four capabilities that have been implemented as LoRA adapters for ibm-granite/granite-4.0-micro and made available in this HF repository are:
Guardian Core: A LoRA adapter trained to judge whether the input prompts and output responses of an LLM-based system meet specified criteria, including safety risks (harm, jailbreaking, profanity, violence, sexual content, social bias, unethical behavior), hallucinations related to tool/function calls, and retrieval-augmented generation (RAG) in agent-based systems. The model outputs a JSON object with a score field indicating "yes" (criteria met / risk detected) or "no" (criteria not met / no risk). Details can be found in the guardian-core readme.
Factuality Detection: A LoRA adapter specifically designed to assess factual correctness by explicitly taking into account contextual passages that may contain contradicting or conflicting information. Rather than assuming contextual consistency, the adapter evaluates LLM-generated responses against one or more context sources and identifies cases where the response conflicts with, misrepresents, or selectively ignores evidence present in those contexts. Details can be found in the factuality-detection readme.
Factuality Correction: A LoRA adapter specifically designed to correct factually incorrect LLM-generated responses by explicitly taking into account contextual passages that may contain contradicting or conflicting information. The adapter is capable of correcting factual inaccuracies in long-form responses composed of multiple atomic units—such as individual facts or claims—while preserving the full generative and reasoning capabilities of the base model. Details can be found in the factuality-correction readme.
Policy Guardrails: A LoRA adapter that provides policy compliance checking. Given a policy and a scenario, it enables the base model to accurately decide whether the scenario complies with, or violates, the given policy. It provides a third response ('Ambiguous') if it is not possible to decide compliance/non-compliance with a high level of certainty. Details can be found in the policy-guardrails readme.
- Downloads last month
- 16
Model tree for ibm-granite/granitelib-guardian-r1.0
Base model
ibm-granite/granite-4.0-micro