1
Linux & Shell: essential commands, bash scripting, permissions, cron jobs
2
Git & GitHub: branching, merge, rebase, pull requests, CI/CD workflows
3
Advanced Python: OOP, decorators, generators, context managers, typing, async/await
4
CI/CD: linting (Ruff, Pylint), packaging (Poetry), tests, GitHub Actions, pipelines
5
Docker: Dockerfile, images, containers, volumes, networks, multi-stage builds
6
Docker Compose: multi-container services, dependencies, healthchecks, local orchestration
7
FastAPI: routes, Pydantic models, dependencies, middleware, deployment
8
Advanced SQL: window functions, CTEs, analytical queries, optimization, indexing
9
BigQuery: serverless architecture, partitioning, clustering, costs, UDFs, federated queries
10
PostgreSQL: configuration, replication, indexing (B-tree, GIN, GiST), VACUUM, EXPLAIN ANALYZE
11
Data Modeling: star schema, fact/dimension tables, normalization, SCD, data vault
12
ELT vs ETL vs ETLT: patterns, trade-offs, architecture choices
13
Fivetran & Airbyte: connectors, sync modes, CDC, schema evolution
14
dbt: models, sources, refs, tests, snapshots, incremental models, Jinja macros
15
Apache Airflow: DAGs, operators, sensors, XCom, connections, pools, task dependencies
16
PySpark: RDD vs DataFrame, transformations, actions, partitioning, broadcast variables
17
Streaming: Pub/Sub (topics, subscriptions), Apache Beam (PCollections, transforms, windowing), Dataflow
18
Kubernetes: pods, deployments, services, ingress, ConfigMaps, Secrets, Helm, scaling
19
Terraform: providers, resources, state, modules, plan/apply, infrastructure as code
20
IAM & security: least privilege principles, service accounts, GCP roles
21
NoSQL databases: GraphDB (Neo4j), Document DBs (MongoDB, Firestore), Wide Column (Cassandra, Bigtable)
22
Data Architecture: Data Lake vs Data Warehouse vs Data Lakehouse, Data Mesh, Data Contracts
23
Monitoring & observability: logging, metrics, alerting, SLA/SLO/SLI, data quality checks