Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Governed Autonomy in Reliability Engineering: Integrating Error Budgets with AI-Driven Remediation
0
Zitationen
1
Autoren
2023
Jahr
Abstract
Modern large-scale software systems operate under increasing architectural and operational complexity, driven by microservices-based designs, elastic cloud infrastructure and rapid, continuous delivery practices that introduce constant change into production environments.While traditional reliability engineering techniques such as static thresholds, manual incident response and rule-based automation have historically ensured system stability, they increasingly struggle to scale in the face of highly distributed components, unpredictable workloads and tight availability and latency objectives.Site Reliability Engineering (SRE) addressed this challenge by formalizing reliability as a measurable, enforceable engineering concern through the use of Service Level Objectives (SLOs) and error budgets, providing a principled mechanism to balance innovation velocity with operational risk.In parallel, advances in artificial intelligence (AI) and machine learning (ML) have transformed operational monitoring and response by enabling predictive failure detection, anomaly identification across high-dimensional telemetry and increasingly autonomous remediation workflows.This article synthesizes these complementary developments and proposes an integrated reliability engineering paradigm in which error budgets serve as explicit governance constraints that bound acceptable system behavior, while AI-driven autonomous remediation functions as a closed-loop control mechanism that continuously senses, analyzes and corrects system state.Drawing on foundational SRE literature, established research on self-healing systems, empirical insights from chaos engineering and contemporary AIOps architectures, the paper articulates a conceptual framework for AI-assisted reliability engineering that preserves human intent, enforces accountability and enables scalable, adaptive operational resilience in modern production systems.
Ähnliche Arbeiten
The global landscape of AI ethics guidelines
2019 · 4.575 Zit.
The Limitations of Deep Learning in Adversarial Settings
2016 · 3.867 Zit.
Trust in Automation: Designing for Appropriate Reliance
2004 · 3.415 Zit.
Fairness through awareness
2012 · 3.278 Zit.
Mind over Machine: The Power of Human Intuition and Expertise in the Era of the Computer
1987 · 3.183 Zit.