
Alerting & Incident Response
Runbooks, paging, escalation, incident management, on-call practices, postmortems, troubleshooting alerts
1What is a runbook in the DevOps context?
What is a runbook in the DevOps context?
Respuesta
A runbook is an operational document containing standardized procedures for handling incidents or recurring maintenance tasks. It enables on-call teams to follow predefined and validated steps to quickly resolve known issues, thereby reducing resolution time and human errors. Runbooks can be manual or automated and represent a fundamental element of incident management.
2What is paging in the context of incident management?
What is paging in the context of incident management?
Respuesta
Paging is the alerting mechanism that notifies on-call engineers when a critical incident occurs. This system uses various communication channels such as SMS, phone calls, or dedicated applications to ensure the responsible person is alerted quickly, even outside working hours. A good paging system includes automatic escalation policies if the first person does not respond.
3What is a postmortem in the context of incident management?
What is a postmortem in the context of incident management?
Respuesta
A postmortem is a retrospective analysis conducted after a significant incident. Its objective is to understand the root causes of the incident, document the timeline of events, identify corrective actions, and share learnings with the team. An effective postmortem adopts a blameless approach, focused on improving systems and processes rather than individual accountability.
What is the difference between an event and an incident?
What does MTTA (Mean Time To Acknowledge) mean?
+17 preguntas de entrevista
Otros temas de entrevista DevOps
Version Control & Git
Linux Fundamentals
Shell Scripting & Bash
Networking Basics
Docker Fundamentals
CI/CD Fundamentals
GitHub Actions
GitLab CI/CD
Jenkins
Kubernetes Basics
Kubernetes Networking
Kubernetes Advanced
Ingress & API Gateway
Terraform Basics
Terraform Advanced
Ansible & Configuration Management
AWS Essentials
Azure Fundamentals
GCP Fundamentals
Monitoring & Prometheus
Logging & ELK Stack
Cloud Identity & Secrets
CI/CD Pipeline Security
Helm & Kubernetes
Runtime & Cluster Security
Container Supply Chain Security
Service Mesh & Istio
GitOps & ArgoCD
Progressive Delivery
Distributed Observability
Disaster Recovery & Backup
Performance Optimization
Cloud Cost Optimization
SRE Principles
Chaos Engineering
Platform Engineering
Domina DevOps para tu próxima entrevista
Accede a todas las preguntas, flashcards, tests técnicos, ejercicios de code review y simuladores de entrevista.
Empieza gratis