Question 1

What is the fundamental principle to apply when assigning IAM permissions in GCP?

Accepted Answer

The principle of least privilege means granting only the permissions strictly necessary to accomplish a task. In Data Engineering, this means a pipeline should only have access to the buckets, datasets, and tables it actually needs. This principle reduces the attack surface and limits potential damage if a service account is compromised.

Question 2

What is the difference between a service account and a user account in GCP?

Accepted Answer

A service account is an identity designed for applications and services, while a user account represents a person. Service accounts authenticate using JSON keys or Workload Identity, have no password, and are designed for automation. In Data Engineering, each pipeline should have its own service account with specific permissions.

Question 3

What is the IAM role hierarchy in GCP, from least permissive to most permissive?

Accepted Answer

The IAM role hierarchy goes from Viewer (read-only) to Editor (read/write without IAM management) to Owner (full control including IAM and billing). For data pipelines, it is recommended to use granular predefined roles like BigQuery Data Viewer or Storage Object Creator rather than these overly broad primitive roles.

IAM and Data Security

What is the fundamental principle to apply when assigning IAM permissions in GCP?

Answer

What is the difference between a service account and a user account in GCP?

Answer

What is the IAM role hierarchy in GCP, from least permissive to most permissive?

Answer

Why should JSON service account keys be avoided in a GCP production environment?

What is the difference between encryption at rest and encryption in transit?

Other Data Engineering interview topics

Linux & Shell - Fundamentals

Git & GitHub - Fundamentals

Advanced Python for Data Engineering

Docker - Fundamentals

Google Cloud Platform - Fundamentals

CI/CD and Code Quality

Docker Compose

FastAPI - Data APIs

Advanced SQL for Data Engineering

Data Lake - Architecture and Ingestion

BigQuery for Data Engineering

PostgreSQL - Administration

Data Modeling for Data Engineering

Fivetran & Airbyte - Data Ingestion

dbt - Fundamentals

Apache Airflow - Fundamentals

Kubernetes - Fundamentals

dbt - Advanced Features

ETL / ELT / ETLT Patterns

Apache Airflow - Advanced

Airflow + dbt - Pipeline Orchestration

PySpark - Large-Scale Processing

Google Pub/Sub - Data Streaming

Apache Beam & Dataflow

Kubernetes - Production and Scaling

Terraform - Infrastructure as Code

NoSQL Databases

Modern Data Architecture

Monitoring and Observability

Master Data Engineering for your next interview