DevOps

Alerting & Incident Response

Runbooks, paging, escalation, incident management, on-call practices, postmortems, troubleshooting alerts

20 面接問題·
Mid-Level
1

What is a runbook in the DevOps context?

回答

A runbook is an operational document containing standardized procedures for handling incidents or recurring maintenance tasks. It enables on-call teams to follow predefined and validated steps to quickly resolve known issues, thereby reducing resolution time and human errors. Runbooks can be manual or automated and represent a fundamental element of incident management.

2

What is paging in the context of incident management?

回答

Paging is the alerting mechanism that notifies on-call engineers when a critical incident occurs. This system uses various communication channels such as SMS, phone calls, or dedicated applications to ensure the responsible person is alerted quickly, even outside working hours. A good paging system includes automatic escalation policies if the first person does not respond.

3

What is a postmortem in the context of incident management?

回答

A postmortem is a retrospective analysis conducted after a significant incident. Its objective is to understand the root causes of the incident, document the timeline of events, identify corrective actions, and share learnings with the team. An effective postmortem adopts a blameless approach, focused on improving systems and processes rather than individual accountability.

4

What is the difference between an event and an incident?

5

What does MTTA (Mean Time To Acknowledge) mean?

+17 面接問題

次の面接に向けてDevOpsをマスター

すべての問題、flashcards、技術テスト、コードレビュー演習、面接シミュレーターにアクセス。

無料で始める