
NoSQL Databases
GraphDB (Neo4j), Document DBs (MongoDB, Firestore), Wide Column (Cassandra, Bigtable), CAP theorem, use cases
1What is the CAP theorem and what are its three properties?
What is the CAP theorem and what are its three properties?
Answer
The CAP theorem states that a distributed system can only guarantee two of three properties simultaneously: Consistency (all nodes see the same data), Availability (the system always responds), and Partition tolerance (the system continues operating despite network partitions). This theorem is fundamental for understanding architectural trade-offs in NoSQL databases.
2What is the main difference between a Document database (MongoDB) and a Wide Column database (Cassandra)?
What is the main difference between a Document database (MongoDB) and a Wide Column database (Cassandra)?
Answer
Document databases like MongoDB store JSON/BSON documents with flexible schemas and allow complex queries on any field. Wide Column databases like Cassandra organize data into column families with partition keys, optimized for massive writes and key-based reads. MongoDB excels for hierarchical data, Cassandra for high-velocity time series data.
3In which use case should Neo4j be preferred over MongoDB or Cassandra?
In which use case should Neo4j be preferred over MongoDB or Cassandra?
Answer
Neo4j is a graph database optimized for complex multi-level relationships between entities. It excels for social networks, recommendation systems, fraud detection and dependency analysis. The Cypher language enables traversing millions of relationships in milliseconds, where SQL joins or NoSQL lookups would be prohibitively slow in terms of performance.
What is a partition key in Cassandra and why is it critical for performance?
What is the Cypher syntax to find all friends of friends of a user in Neo4j?
+17 interview questions
Other Data Engineering interview topics
Linux & Shell - Fundamentals
Git & GitHub - Fundamentals
Advanced Python for Data Engineering
Docker - Fundamentals
Google Cloud Platform - Fundamentals
CI/CD and Code Quality
Docker Compose
FastAPI - Data APIs
Advanced SQL for Data Engineering
Data Lake - Architecture and Ingestion
BigQuery for Data Engineering
PostgreSQL - Administration
Data Modeling for Data Engineering
Fivetran & Airbyte - Data Ingestion
dbt - Fundamentals
Apache Airflow - Fundamentals
Kubernetes - Fundamentals
dbt - Advanced Features
ETL / ELT / ETLT Patterns
Apache Airflow - Advanced
Airflow + dbt - Pipeline Orchestration
PySpark - Large-Scale Processing
Google Pub/Sub - Data Streaming
Apache Beam & Dataflow
Kubernetes - Production and Scaling
Terraform - Infrastructure as Code
Modern Data Architecture
Monitoring and Observability
IAM and Data Security
Master Data Engineering for your next interview
Access all questions, flashcards, technical tests, code review exercises and interview simulators.
Start for free