OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 29.03.2026, 04:02

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Governed Autonomy in Reliability Engineering: Integrating Error Budgets with AI-Driven Remediation

2023·0 Zitationen·Journal of Artificial Intelligence Machine Learning and Data ScienceOpen Access
Volltext beim Verlag öffnen

0

Zitationen

1

Autoren

2023

Jahr

Abstract

Modern large-scale software systems operate under increasing architectural and operational complexity, driven by microservices-based designs, elastic cloud infrastructure and rapid, continuous delivery practices that introduce constant change into production environments.While traditional reliability engineering techniques such as static thresholds, manual incident response and rule-based automation have historically ensured system stability, they increasingly struggle to scale in the face of highly distributed components, unpredictable workloads and tight availability and latency objectives.Site Reliability Engineering (SRE) addressed this challenge by formalizing reliability as a measurable, enforceable engineering concern through the use of Service Level Objectives (SLOs) and error budgets, providing a principled mechanism to balance innovation velocity with operational risk.In parallel, advances in artificial intelligence (AI) and machine learning (ML) have transformed operational monitoring and response by enabling predictive failure detection, anomaly identification across high-dimensional telemetry and increasingly autonomous remediation workflows.This article synthesizes these complementary developments and proposes an integrated reliability engineering paradigm in which error budgets serve as explicit governance constraints that bound acceptable system behavior, while AI-driven autonomous remediation functions as a closed-loop control mechanism that continuously senses, analyzes and corrects system state.Drawing on foundational SRE literature, established research on self-healing systems, empirical insights from chaos engineering and contemporary AIOps architectures, the paper articulates a conceptual framework for AI-assisted reliability engineering that preserves human intent, enforces accountability and enables scalable, adaptive operational resilience in modern production systems.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Ethics and Social Impacts of AIArtificial Intelligence in Healthcare and EducationOccupational Health and Safety Research
Volltext beim Verlag öffnen