🛠️ Building a Canonical Data Model Using AI

January 20, 2026

By Ted Steinmann

Today, I completed a canonical data model for a licensing platform—not by starting from scratch, but by systematically extracting, reviewing, and organizing knowledge that was already scattered across our wiki. The process revealed how AI can accelerate documentation work when paired with human judgment.

Here's how we did it and what I learned.

The Starting Point: Fragmented Knowledge

The wiki had the pieces:

Feature documentation for Applications, Certifications, Education, Inspections
Entity relationship diagrams buried in various Technical Reference files
Integration mappings for Elite, NURSYS, NREMT, and EMS Compact
A CSV list of 70+ entity attributes from an old ERD
Partial entity definitions scattered across multiple feature areas

But the pieces didn't form a coherent picture. Product discussions used different terminology. Integration docs referenced overlapping concepts. There was no single source of truth for "What is a Certification?" or "How does Authorization relate to Certification?"

The Process: AI-Assisted Knowledge Extraction

Step 1: Gather and Summarize I asked AI to search the codebase and identify what documentation already existed. Instead of reading dozens of files manually, I got a structured summary of:

Which files contained entity definitions and diagrams
Where integration mappings lived and what they covered
What was complete, what was placeholder, what was contradictory

Step 2: Convert and Consolidate AI helped translate existing content into a unified format:

Converted a PNG entity-relationship diagram into Mermaid syntax
Reorganized scattered attribute lists into logical groupings (Demographics, Contact, Employment, Certifications)
Unified terminology across feature areas (Certification vs. License vs. Credential)
Built integration mapping tables from documentation fragments

Step 3: Iterate and Refine As I reviewed the generated content, I caught inconsistencies and made corrections:

"EMS Compact provides privilege to practice"—but we only report license status to it, not collect from it
"Vehicle owns certifications"—but actually Services own them; Vehicles only receive them
"Authorization is a separate entity"—but it's really an extension of Certification
"Certifications can expire differently than Authorizations"—this distinction needed explicit documentation

I'd flag the error, and AI would propagate corrections throughout the document—updating tables, integration mappings, and relationship diagrams.

Step 4: Validate Against Reality The final step: Does this match how the system actually works?

Cross-checked integration mappings against Technical Reference files
Verified entity relationships against feature documentation
Confirmed with colleagues that the model reflected actual business rules
Tested the model against real use cases (multistate nursing licenses, EMS provider records)

Key Insights

AI Excels at Consolidation, Humans Excel at Validation AI can quickly extract, organize, and reformat content from multiple sources—pulling definitions from five different docs and unifying them into one coherent table. But catching the subtle error—"we report to EMS Compact, we don't collect from it"—requires domain knowledge and business context. The combination is powerful: AI handles the mechanical work, humans validate the logic.

Documentation Debt Has Value Scattered documentation seemed like a mess. But it was actually valuable: real feature definitions, real integration mappings, real constraints discovered through actual implementation. The job wasn't inventing a model from theory—it was connecting dots that already existed but weren't visible from any single vantage point.

Iteration Beats Perfection Instead of trying to design the perfect model upfront, we started with what existed, generated a draft, reviewed it, found gaps, and refined. Each cycle took minutes, not hours. We went from "fragmented docs" to "coherent model" through rapid iteration, not lengthy planning.

Version Control for Documentation Having each iteration saved made it easy to spot what changed and why. When I said "EMS Compact should only be outbound," AI updated the table, diagram, and narrative in seconds. In a traditional document, that would have been five manual edits prone to inconsistency.

What We Built

A canonical data model that:

Identifies three core record types (Personnel, Services, Vehicles) and their relationships
Maps seven entity domains (Licensing, Education, Exams, Inspections, Investigations, Transactions, Shared Capabilities)
Documents integration contracts for five external systems with clear direction (inbound, outbound, bidirectional)
Provides shared terminology (Certification vs. Authorization, Home State vs. Remote State, Privilege to Practice)
Clarifies ownership rules (Services own Certifications and Vehicles; Vehicles receive Certifications via their Service)

But more importantly: a living document that teams can now reference, critique, and improve based on real usage.

Sharing and Seeking Input

The model is now in the wiki, ready for team review. I shared it with the team via Teams:

"Looking into the next year, we have a goal to lay some of the foundation for better analytic and API capabilities. In doing so, we need to know what to focus on productizing. To help with this, I put together a Canonical Data Model to represent entities and relationships in their simplest forms. I'd greatly appreciate review and any input anybody might have."

The goal: Get feedback early. Where is the model wrong? Where does it miss context? What assumptions did we get right or wrong? Does it match how customers think about their licensing workflows?

Feedback will drive the next iteration. And each iteration will be faster because we have a foundation to build on.

Tools I Used

The speed of this project came down to smart tool choices:

VS Code + Markdown — I drafted and iterated everything in Markdown, which kept the documentation portable and version-controlled. We use Azure devOps, but Markdown is universal. No vendor lock-in, just plain text that works anywhere. AND most importantly how agents prefer to write.
Co-pilot (Primarily Claude in VS Code) — My primary AI partner for content extraction, consolidation, and bulk edits. The ability to refrence other wiki articles and ask for synthesis saved countless hours.
Mermaid — For converting static ERD diagrams into code-based diagrams I could version control and iterate on quickly.
Git — Each iteration was committed separately, so I could see exactly what changed and why.

The real enabler wasn't any single tool—it was keeping everything in formats that let AI and humans work together seamlessly. Markdown + Git + AI made iteration effortless.

Why This Approach Matters:

If you're managing complex products or documentation, you probably have knowledge scattered across wikis, feature specs, and team heads. Before investing in a complete rewrite, consider:

Extract what exists — AI can help scan and summarize existing documentation quickly
Have AI generate a draft — Consolidate fragments into a coherent structure
Review and edit ruthlessly — Catch errors, validate logic, add context
Share early — Get team feedback before investing further
Iterate based on feedback — Each round of refinement gets easier with a foundation in place

You'll often find the foundation is stronger than you thought. The real work isn't inventing a model from theory—it's surfacing and organizing knowledge that already exists.

The AI accelerates the mechanical work. The human judgment ensures it's correct.

Next: Waiting for team input on the canonical model. Feedback will guide the next phase: translating this logical model into feature-specific technical references, API contracts, and data validation rules.

Categories: blog

Tags: product-management, systems