DevOps

Alerting & Incident Response

Runbooks, paging, escalation, incident management, on-call practices, postmortems, troubleshooting alerts

20 câu hỏi phỏng vấn·
Mid-Level
1

What is a runbook in the DevOps context?

Câu trả lời

A runbook is an operational document containing standardized procedures for handling incidents or recurring maintenance tasks. It enables on-call teams to follow predefined and validated steps to quickly resolve known issues, thereby reducing resolution time and human errors. Runbooks can be manual or automated and represent a fundamental element of incident management.

2

What is paging in the context of incident management?

Câu trả lời

Paging is the alerting mechanism that notifies on-call engineers when a critical incident occurs. This system uses various communication channels such as SMS, phone calls, or dedicated applications to ensure the responsible person is alerted quickly, even outside working hours. A good paging system includes automatic escalation policies if the first person does not respond.

3

What is a postmortem in the context of incident management?

Câu trả lời

A postmortem is a retrospective analysis conducted after a significant incident. Its objective is to understand the root causes of the incident, document the timeline of events, identify corrective actions, and share learnings with the team. An effective postmortem adopts a blameless approach, focused on improving systems and processes rather than individual accountability.

4

What is the difference between an event and an incident?

5

What does MTTA (Mean Time To Acknowledge) mean?

+17 câu hỏi phỏng vấn

Nắm vững DevOps cho lần phỏng vấn tiếp theo

Truy cập tất cả câu hỏi, flashcards, bài kiểm tra kỹ thuật, bài tập code review và mô phỏng phỏng vấn.

Bắt đầu miễn phí