DevOps

Chaos Engineering

Chaos experiments, failure injection, resilience testing, Chaos Monkey, gamedays, troubleshooting resilience

20 interview questions·
Senior
1

What is Chaos Engineering?

Answer

Chaos Engineering is a discipline of experimenting on a distributed system to build confidence in the system's capability to withstand turbulent conditions in production. This proactive approach helps identify weaknesses before they cause critical incidents. Unlike traditional testing, Chaos Engineering focuses on observing system-wide behavior rather than verifying isolated components.

2

What is the fundamental principle of the steady state hypothesis in Chaos Engineering?

Answer

The steady state hypothesis defines the normal state of the system in terms of observable metrics before running a chaos experiment. It allows objective measurement of whether the system remains resilient during and after failure injection. Without this clearly defined hypothesis, it becomes impossible to determine if the experiment succeeded or failed, as there is no baseline to compare system behavior against.

3

What is the blast radius in the context of Chaos Engineering?

Answer

The blast radius represents the potential scope of impact of a chaos experiment on the system and its users. Limiting the blast radius is an essential practice to prevent an experiment from causing disproportionate damage. This can be achieved by targeting a subset of instances, geographically limiting the experiment, or using feature flags to control user exposure.

4

What was the original role of Chaos Monkey at Netflix?

5

What is the difference between Chaos Monkey and Chaos Gorilla in Netflix's Simian Army?

+17 interview questions

Master DevOps for your next interview

Access all questions, flashcards, technical tests, code review exercises and interview simulators.

Start for free