Question 1

What is the CAP theorem and what are its three properties?

Accepted Answer

The CAP theorem states that a distributed system can only guarantee two of three properties simultaneously: Consistency (all nodes see the same data), Availability (the system always responds), and Partition tolerance (the system continues operating despite network partitions). This theorem is fundamental for understanding architectural trade-offs in NoSQL databases.

Question 2

What is the main difference between a Document database (MongoDB) and a Wide Column database (Cassandra)?

Accepted Answer

Document databases like MongoDB store JSON/BSON documents with flexible schemas and allow complex queries on any field. Wide Column databases like Cassandra organize data into column families with partition keys, optimized for massive writes and key-based reads. MongoDB excels for hierarchical data, Cassandra for high-velocity time series data.

Question 3

In which use case should Neo4j be preferred over MongoDB or Cassandra?

Accepted Answer

Neo4j is a graph database optimized for complex multi-level relationships between entities. It excels for social networks, recommendation systems, fraud detection and dependency analysis. The Cypher language enables traversing millions of relationships in milliseconds, where SQL joins or NoSQL lookups would be prohibitively slow in terms of performance.

NoSQL Databases

What is the CAP theorem and what are its three properties?

Answer

What is the main difference between a Document database (MongoDB) and a Wide Column database (Cassandra)?

Answer

In which use case should Neo4j be preferred over MongoDB or Cassandra?

Answer

What is a partition key in Cassandra and why is it critical for performance?

What is the Cypher syntax to find all friends of friends of a user in Neo4j?

Other Data Engineering interview topics

Linux & Shell - Fundamentals

Git & GitHub - Fundamentals

Advanced Python for Data Engineering

Docker - Fundamentals

Google Cloud Platform - Fundamentals

CI/CD and Code Quality

Docker Compose

FastAPI - Data APIs

Advanced SQL for Data Engineering

Data Lake - Architecture and Ingestion

BigQuery for Data Engineering

PostgreSQL - Administration

Data Modeling for Data Engineering

Fivetran & Airbyte - Data Ingestion

dbt - Fundamentals

Apache Airflow - Fundamentals

Kubernetes - Fundamentals

dbt - Advanced Features

ETL / ELT / ETLT Patterns

Apache Airflow - Advanced

Airflow + dbt - Pipeline Orchestration

PySpark - Large-Scale Processing

Google Pub/Sub - Data Streaming

Apache Beam & Dataflow

Kubernetes - Production and Scaling

Terraform - Infrastructure as Code

Modern Data Architecture

Monitoring and Observability

IAM and Data Security

Master Data Engineering for your next interview