Data Models
BenkyouZoneOverview
Every production system stores data. The question is not whether to store it, but how — and the wrong answer is expensive to fix.
A ride-hailing platform stores user profiles across eight normalised tables and pays a five-table JOIN on every read. A digital bank stores its ledger in a document database and silently loses a write when two transactions race. Neither failure is a bug. Both systems functioned. The engineers who built them were competent. They simply chose the wrong storage model for the workload — and discovered the mismatch in production.
This course teaches you to make that choice deliberately.
You will learn five data models — relational, document, XML/schema validation, RDF/knowledge graphs, and property graphs — each through the lens of the workloads it serves and the guarantees it trades. You will build a decision framework that produces a justified choice (not a universal answer), apply it to progressively harder scenarios, and learn to design systems that combine multiple models when no single model fits.
The course is structured as a progression: Topics 1–5 build the relational foundation (conceptual modelling, normalisation, SQL querying, enforcement). Topics 6–10 survey four alternative models, each measured against the relational baseline. Topic 11 brings all five together in a capstone multi-model design.
By the end, you won't just know five data models. You'll have the discipline to evaluate any model — including ones that don't exist yet.
What You Will Learn
- Identify workload characteristics — query patterns, consistency requirements, schema stability — and match them to the model whose strengths align
- Design relational schemas from business requirements through ERD, normalisation (BCNF), SQL querying, and a four-layer enforcement stack (constraints, views, transactions, indexes)
- Design document schemas with aggregate boundaries, embed-or-reference decisions, snapshot discipline, and honest trade-off ledgers
- Read and validate XML/JSON contracts at organisational trust boundaries using XSD, JSON Schema, and the four-check validation model
- Build and query knowledge graphs with RDF triples, URIs, ontology, and SPARQL pattern matching — merging data from independent sources without ETL pipelines
- Model and traverse property graphs with Cypher, using per-hop filtering, native cycle detection, and d^k cost estimation for fraud detection and relationship-rich workloads
- Design multi-model architectures that assign models to workloads and explicitly design the data-flow boundaries between them
Course Structure
| Part | Topics | Focus | Hours (video) |
|---|---|---|---|
| Foundation | 1 | Introduction — five models, decision framework, enforcement spectrum | ~0.7 |
| Relational Deep-Dive | 2–5 | ERD → Normalisation → SQL Querying → Enforcement | ~3.1 |
| Alternative Models | 6–10 | Document, XML/XSD, RDF/SPARQL, Property Graph/Cypher | ~4.2 |
| Capstone | 11 | Multi-model design — workload decomposition, boundary design, integration tax | ~0.7 |
| Total | 11 topics | 62 video lectures across 11 chapters | ~8.7 |
Each topic includes video lectures (5–8 acts per topic), per-act MCQ quizzes, discussion prompts, and a hands-on lab.
Total estimated learning time: 20–25 hours (video + labs + self-study).
Pace: Self-paced. Suggested schedule: 3–4 hours per week over 6–8 weeks.
Who This Course Is For
- Software engineers who choose databases by familiarity ("we've always used Postgres") and want a systematic framework
- Data engineers who build pipelines across multiple storage systems and need to understand the trade-offs at each boundary
- Computer science students taking a database or data management course who want practical design skills alongside theory
- Technical leads and architects who evaluate new database products and need vocabulary to justify their choices
Prerequisites: Basic programming experience (any language). No prior database experience required — Topic 1 starts from first principles. All technical terms (BCNF, ACID, SPARQL, etc.) are introduced and explained within the course.
What Makes This Course Different
Failure-driven. Every topic opens with real-world failures — systems that functioned but chose the wrong model. The failures define the problem; the topic builds the solution.
Trade-off discipline. Every model decision requires a cost statement: "We chose X, accepting the cost of Y." This habit — not any single model — is the course's most transferable skill.
Five models, one framework. Most courses teach one model deeply or survey many shallowly. This course does both: a four-topic relational deep-dive and four alternative models, all evaluated through the same decision framework.
Predict before you run. SQL queries, SPARQL patterns, Cypher traversals, MongoDB pipelines — every query language is taught with a "predict the result first, then verify" discipline that transfers across all five models.
Honest about costs. No model is presented as universally superior. Every chapter's concluding remarks include "The Honest Cost" — what the model does not solve. The capstone's integration tax analysis makes the cost of combining models concrete.
Tools and Platform
- Video lectures: 62 per-act videos (~8.7 hours), each 2–17 minutes
- Quizzes: 105 MCQ questions with explanations (per-act, scenario-based)
- Discussion prompts: Key takeaways, thought-provoking questions, and cross-topic connections beneath each video
- Labs: Hands-on exercises in Google Colab (SQL/MySQL, MongoDB, xmllint, Python) — no local installation required
- Reference schemas: Orders & Payments (relational), Super-App Profiles (document), UBL Invoice (XML/XSD), Movie Knowledge Graph (RDF/SPARQL), Fraud Detection Network (property graph)
Assessment
- Per-topic quizzes: MCQ questions after each video act (scenario-based, with explanations)
- Hands-on labs: One per topic, completed in Google Colab — reasoning exercises plus code execution
- Passing threshold: 70% overall across quizzes and labs
Instructor
Tarapong Sreenuch
Language
English.
Course Outcomes
These are the formally assessed learning outcomes. By completing this course, you will be able to:
- Given a workload description, apply the four-step decision framework to select a data model and explicitly state the costs accepted
- Design a normalised relational schema from an ERD, verify it against BCNF, and harden it with constraints, views, transactions, and indexes
- Design a document schema with aggregate boundaries, bounded embedding, and snapshot discipline
- Read an XSD or JSON Schema contract, predict validation outcomes, and extract data from validated documents
- Write SPARQL queries against a knowledge graph, predict binding counts, and use OPTIONAL for missing data
- Write Cypher traversal queries with per-hop filtering and cycle detection, and estimate traversal cost
- Decompose a multi-workload platform into distinct stores, assign models with two-part justifications, and design data-flow boundaries with explicit consistency contracts
- Given system symptoms (slow reads, lost writes, stuck submissions), diagnose whether the root cause is a model-workload mismatch and identify the appropriate fix