DevOps

SRE Principles

SLIs, SLOs, SLAs, error budgets, toil reduction, incident management, on-call, blameless postmortems

24 면접 질문·
Senior
1

What is an SLI (Service Level Indicator) in SRE?

답변

An SLI (Service Level Indicator) is a quantitative metric that measures a specific aspect of the service level provided to users. Typical SLIs include availability (uptime), latency (response time), error rate, or throughput. These indicators are objectively measured by monitoring systems and serve as the foundation for defining SLOs. For example, an availability SLI could be the percentage of successful HTTP requests (2xx codes) out of total requests.

2

What is the main difference between an SLO and an SLA?

답변

An SLO (Service Level Objective) is an internal service level target defined by the team to guide SRE efforts, with no legal consequences. An SLA (Service Level Agreement) is a formal contract with the client that includes consequences (refunds, penalties) if targets are not met. The SLO is typically stricter than the SLA to create a safety buffer and avoid SLA violations. For example, an SLO of 99.9% with an SLA of 99.5% provides a margin of safety.

3

What is an error budget in SRE?

답변

An error budget is the acceptable amount of failure or unavailability for a service over a given period. It is calculated as the difference between 100% and the SLO. For example, with an SLO of 99.9%, the error budget is 0.1% (approximately 43 minutes of downtime per month). This error budget allows balancing innovation and reliability: as long as budget remains, the team can deploy new features quickly. If exhausted, focus must shift to stability and releases should be postponed.

4

How to calculate the remaining error budget for a service?

5

What to do when a service's error budget is exhausted?

+21 면접 질문

다음 면접을 위해 DevOps을 마스터하세요

모든 질문, flashcards, 기술 테스트, 코드 리뷰 연습, 면접 시뮬레이터에 접근하세요.

무료로 시작하기