Data Models

BenkyouZone

Overview

Every production system stores data. The question is not whether to store it, but how — and the wrong answer is expensive to fix.

A ride-hailing platform stores user profiles across eight normalised tables and pays a five-table JOIN on every read. A digital bank stores its ledger in a document database and silently loses a write when two transactions race. Neither failure is a bug. Both systems functioned. The engineers who built them were competent. They simply chose the wrong storage model for the workload — and discovered the mismatch in production.

This course teaches you to make that choice deliberately.

You will learn five data models — relational, document, XML/schema validation, RDF/knowledge graphs, and property graphs — each through the lens of the workloads it serves and the guarantees it trades. You will build a decision framework that produces a justified choice (not a universal answer), apply it to progressively harder scenarios, and learn to design systems that combine multiple models when no single model fits.

The course is structured as a progression: Topics 1–5 build the relational foundation (conceptual modelling, normalisation, SQL querying, enforcement). Topics 6–10 survey four alternative models, each measured against the relational baseline. Topic 11 brings all five together in a capstone multi-model design.

By the end, you won't just know five data models. You'll have the discipline to evaluate any model — including ones that don't exist yet.

What You Will Learn

Identify workload characteristics — query patterns, consistency requirements, schema stability — and match them to the model whose strengths align
Design relational schemas from business requirements through ERD, normalisation (BCNF), SQL querying, and a four-layer enforcement stack (constraints, views, transactions, indexes)
Design document schemas with aggregate boundaries, embed-or-reference decisions, snapshot discipline, and honest trade-off ledgers
Read and validate XML/JSON contracts at organisational trust boundaries using XSD, JSON Schema, and the four-check validation model
Build and query knowledge graphs with RDF triples, URIs, ontology, and SPARQL pattern matching — merging data from independent sources without ETL pipelines
Model and traverse property graphs with Cypher, using per-hop filtering, native cycle detection, and d^k cost estimation for fraud detection and relationship-rich workloads
Design multi-model architectures that assign models to workloads and explicitly design the data-flow boundaries between them

Course Structure

Part Topics Focus Hours (video)

Foundation 1 Introduction — five models, decision framework, enforcement spectrum ~0.7

Relational Deep-Dive 2–5 ERD → Normalisation → SQL Querying → Enforcement ~3.1

Alternative Models 6–10 Document, XML/XSD, RDF/SPARQL, Property Graph/Cypher ~4.2

Capstone 11 Multi-model design — workload decomposition, boundary design, integration tax ~0.7

Total 11 topics 62 video lectures across 11 chapters ~8.7

Part	Topics	Focus	Hours (video)
Foundation	1	Introduction — five models, decision framework, enforcement spectrum	~0.7
Relational Deep-Dive	2–5	ERD → Normalisation → SQL Querying → Enforcement	~3.1
Alternative Models	6–10	Document, XML/XSD, RDF/SPARQL, Property Graph/Cypher	~4.2
Capstone	11	Multi-model design — workload decomposition, boundary design, integration tax	~0.7
Total	11 topics	62 video lectures across 11 chapters	~8.7

Each topic includes video lectures (5–8 acts per topic), per-act MCQ quizzes, discussion prompts, and a hands-on lab.

Total estimated learning time: 20–25 hours (video + labs + self-study).

Pace: Self-paced. Suggested schedule: 3–4 hours per week over 6–8 weeks.

Who This Course Is For

Software engineers who choose databases by familiarity ("we've always used Postgres") and want a systematic framework
Data engineers who build pipelines across multiple storage systems and need to understand the trade-offs at each boundary
Computer science students taking a database or data management course who want practical design skills alongside theory
Technical leads and architects who evaluate new database products and need vocabulary to justify their choices

Prerequisites: Basic programming experience (any language). No prior database experience required — Topic 1 starts from first principles. All technical terms (BCNF, ACID, SPARQL, etc.) are introduced and explained within the course.

What Makes This Course Different

Failure-driven. Every topic opens with real-world failures — systems that functioned but chose the wrong model. The failures define the problem; the topic builds the solution.

Trade-off discipline. Every model decision requires a cost statement: "We chose X, accepting the cost of Y." This habit — not any single model — is the course's most transferable skill.

Five models, one framework. Most courses teach one model deeply or survey many shallowly. This course does both: a four-topic relational deep-dive and four alternative models, all evaluated through the same decision framework.

Predict before you run. SQL queries, SPARQL patterns, Cypher traversals, MongoDB pipelines — every query language is taught with a "predict the result first, then verify" discipline that transfers across all five models.

Honest about costs. No model is presented as universally superior. Every chapter's concluding remarks include "The Honest Cost" — what the model does not solve. The capstone's integration tax analysis makes the cost of combining models concrete.

Tools and Platform

Video lectures: 62 per-act videos (~8.7 hours), each 2–17 minutes
Quizzes: 105 MCQ questions with explanations (per-act, scenario-based)
Discussion prompts: Key takeaways, thought-provoking questions, and cross-topic connections beneath each video
Labs: Hands-on exercises in Google Colab (SQL/MySQL, MongoDB, xmllint, Python) — no local installation required
Reference schemas: Orders & Payments (relational), Super-App Profiles (document), UBL Invoice (XML/XSD), Movie Knowledge Graph (RDF/SPARQL), Fraud Detection Network (property graph)

Assessment

Per-topic quizzes: MCQ questions after each video act (scenario-based, with explanations)
Hands-on labs: One per topic, completed in Google Colab — reasoning exercises plus code execution
Passing threshold: 70% overall across quizzes and labs

Instructor

Tarapong Sreenuch

Language

English.

Course Outcomes

These are the formally assessed learning outcomes. By completing this course, you will be able to:

Given a workload description, apply the four-step decision framework to select a data model and explicitly state the costs accepted
Design a normalised relational schema from an ERD, verify it against BCNF, and harden it with constraints, views, transactions, and indexes
Design a document schema with aggregate boundaries, bounded embedding, and snapshot discipline
Read an XSD or JSON Schema contract, predict validation outcomes, and extract data from validated documents
Write SPARQL queries against a knowledge graph, predict binding counts, and use OPTIONAL for missing data
Write Cypher traversal queries with per-hop filtering and cycle detection, and estimate traversal cost
Decompose a multi-workload platform into distinct stores, assign models with two-part justifications, and design data-flow boundaries with explicit consistency contracts
Given system symptoms (slow reads, lost writes, stuck submissions), diagnose whether the root cause is a model-workload mismatch and identify the appropriate fix