OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 23.04.2026, 22:04

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

You Need Governance/ VeritasBench - A Benchmark for AI Agent Governability

2026·0 Zitationen·Zenodo (CERN European Organization for Nuclear Research)Open Access
Volltext beim Verlag öffnen

0

Zitationen

1

Autoren

2026

Jahr

Abstract

AI agents are increasingly deployed in high-stakes domains — healthcare, finance, legal — yet existing evaluation frameworks measure only whether agents are intelligent or safe, not whether they are governable. This technical report introduces three interlocking systems that address the AI agent governance gap: ▎ VERITAS is a lightweight, deterministic execution runtime for AI agents operating in regulated environments. Written in Rust, it enforces OPA Rego policies at every agent action, maintains a cryptographic audit chain, and supports real-time human intervention — without sacrificing execution speed (131 tests passing across 7 healthcare scenarios). ▎ VeritasBench is the first benchmark framework designed to measure AI agent governance rather than intelligence. It evaluates four governance dimensions — auditability, controllability, accountability, and policy enforcement — that no existing benchmark (AgentBench, GAIA, AgentHarm, tau-bench) addresses. The framework provides scored rubrics and standardized scenarios for comparing governance implementations. ▎ ClinicLaw is a reference implementation: an AI-native Hospital Information System where every agent action is governed by the VERITAS trust layer, built on FHIR R4 data, with a pluggable LLM backend (Claude API, Ollama, or deterministic mock). It demonstrates how governance-first agent design works in practice with 8 clinical workflows. ▎ Together, these systems establish that measuring and enforcing agent governance is both technically feasible and operationally practical. All code is open source under Apache 2.0. Keywords: AI agent governance, benchmark, auditability, policy enforcement, healthcare AI, FHIR, OPA Rego, trust runtime, ClinicLaw, VERITAS License: Apache 2.0 Related identifiers: - GitHub: https://github.com/Chesterguan/veritas - GitHub: https://github.com/Chesterguan/veritasbench - GitHub: https://github.com/Chesterguan/cliniclaw - Related work: HAVEN Protocol (DOI: 10.5281/zenodo.18701303)

Ähnliche Arbeiten

Autoren

Themen

Artificial Intelligence in Healthcare and EducationMulti-Agent Systems and NegotiationElectronic Health Records Systems
Volltext beim Verlag öffnen