Part I in this series, Part II in this series
A Metadata Application Profile Template is available HERE for paid subscribers
To pick up where we left off, In my previous articles exploring metadata as a data model, we uncovered fundamental disconnects in how organizations manage meaning across data ecosystems. In part I of Metadata as a Data Model, I noted that ownership and management of metadata is often treated as a hot potato—does the onus of metadata management reside with data engineering, business intelligence (BI), data analytics? Or is metadata every contributor and team’s responsibility? Where did enterprise systems go wrong with metadata?
Metadata Hot Takes: MDM and the Semantic Layer
If an organization takes data and records metadata management seriously, there will be a cross-functional team or governance functions tasked with building and maintaining master data management (MDM) systems, to harmonize data, thereby unifying data and metadata in the process. A core function of MDM is to create golden records and canonical definitions, to ultimately create a metadata “single source of truth”. In a similar vein, data engineering, data analytics and business users implement semantic layers, to bridge the gap between technical data ecosystems and business users, to translate complex data representations into common vocabularies and terminologies for end users. MDM and semantic layers can and do work together although generally, each is unique in its processes, and technical components.
While organizations struggle to imbue data systems with rich context and meaning, MDM and semantic layers have fairly drawn the attention of practitioners and vendors alike. It is no secret that AI demands rich semantics, not syntactic data representations, to understand the context and meaning of data within domains and businesses. MDM and semantic layers promise to deliver the all holy rich context, mainly through text literal labels and metadata. However, true structured and semi-structured semantic data representations are not usually threaded through the underlying code, to drive contextual understanding efforts, for the benefit of AI.
Even though MDM and semantic layers can be effective data architecture and data management components, both systems tend to fall short in truly managing metadata holistically. To manage metadata as a data model, machine readable and interoperable code must accompany metadata models to support a metadata program that has teeth. Beyond metadata encoding structures, a metadata model supports AI’s contextual understanding while also sustaining crosswalks, transformers and translation layers, to negotiate organizational metadata alignment. Not one model to rule them all, but machine readable, interoperable manifestations of metadata and its related contexts.
To address the risks and realities of metadata management, library science has devised a methodology known as the metadata application profile or MAP. In no way a replacement for MDM or semantic layers, the MAP works to provide a roadmap to an organization’s metadata, and methodologies for metadata synchronization and reconciliation. A well crafted MAP supports teams, organizational data systems and AI, all while structuring metadata according to definitions and a metadata reconciliation layer. Consider the MAP living and breathing documentation, able to serve up rich context and meaning, so sorely lacking in most data and business environments.
We Have a Semantic Crisis
Organizations struggle with metadata and semantic consistency. According to a 2021 Gartner report, 75% of MDM programs fail to meet business objectives. The financial impact reaches astronomical levels, when we account for poor data quality, which costs organizations an average of USD 12.9 million to $15 million each year. At the macroeconomic level, IBM discovered that in the US alone, businesses lose $3.1 trillion annually, due to poor data quality.
When we dig into the root causes, metadata and semantic inconsistency emerge as primary culprits. According to O'Reilly research, more than 60% of respondents to a survey selected "Too many data sources and inconsistent data" as their top challenge, followed by "Disorganized data stores and lack of metadata", which was selected by just under 50% of respondents. The operational impact? Data teams spend 30–40% of their time handling data quality issues instead of working on revenue-generating activities.
Organizations have invested millions in an MDM platforms that promise to be the "single source of truth" for data. Meanwhile, the BI team has built elaborate semantic models in tools like dbt or Looker that define what a "customer" means for analytics and business user purposes. These two systems speak different languages, use different standards, and often do not interoperate.
The result? Data engineers spend countless hours writing translation code. Business analysts create spreadsheets to manually map between systems. And executives wonder why their "single source of truth" feels more like multiple sources of confusion.
Enter the Metadata Application Profile
A metadata application profile (MAP) is, at its core, a declaration of which metadata terms an organization uses, how it uses them, and what constraints apply in specific contexts. As Dublin Core elegantly defines it, an application profile is "an assemblage of metadata elements selected from one or more metadata schemas and combined in a compound schema."
That compound schema defines fields, encodes semantics, manages relationships, and most crucially, provides crosswalks between different vocabularies. Think of a MAP as a Rosetta Stone for your data ecosystem. It doesn't replace your MDM or your semantic layer; instead, it provides the translation instructions that allow them to communicate fluently. How crosswalks and translations between metadata schemas are operationalized depends upon data architectures, but more on that later.
The CatChums Example
Let me share a really fun example from the library science world that perfectly illustrates how MAPs work in practice. The CatChums metadata application profile was designed for a fictional nonprofit organization creating a searchable database of cat videos. While whimsical and a little dated, it provides a perfect demonstration of metadata management principles that can be directly applied to enterprise challenges. I have linked the CatChums example to serve as a reference and template for a MAP.
The CatChums MAP does several things that are directly applicable to enterprise data management. First, it defines 18 specific metadata elements, from basic fields like "Title" and "Creator" to domain-specific attributes like "CATegories" (including Drama, Comedy, and Musical cat videos) and "Cat Breed", using The International Cat Association's recognized breeds as a controlled vocabulary.
Second, and this is crucial, it provides crosswalks to four different metadata standards: MODS, Dublin Core, VRA 3.0, and VRA 4.0. This means a video tagged as having a "Creator" in a local schema automatically maps to "Creator
" in Dublin Core, "Creator.Personalname
" in VRA 3.0, and specific agent structures in VRA 4.0. Finally, it establishes controlled vocabularies. The vocabulary includes terms like "Jumping" with a narrower term of "Jumping fail"—a hierarchical relationship that maintains semantic consistency across the system.
Of note, a MAP in the library science domain almost always delivers rich ontological semantics as part of the specifications. Consider the RDF ontology rendering as a hyper machine readable semantic layer, one that is based less on the syntax of SQL, and more focused on supporting interoperability. This means that a MAP supports MDM’s objective to unify and harmonize data, delivers text literal labels for business users while supporting AI systems with rich contextual understanding.
How MAPs Solve the MDM-Semantic Layer Disconnect
The fundamental tension between MDM and semantic layers isn't a technology problem—it's a semantics problem. Master Data Management wants to create a single, authoritative version of truth, like trying to establish one perfect dictionary that everyone must use. Meanwhile, human readable text labels derived from semantic layers manifest as glossaries, data catalogs and human-friendly BI dashboards, to capture the languages of the business.
The problem isn't that either approach is wrong—it's that they're solving different parts of the same problem without a translation mechanism between them. This is where a MAP comes in, acting as a semantic treaty that allows both approaches to coexist productively.
The Three-Layer Solution
Think of a metadata application profile as having three interconnected layers.
The Canonical Layer establishes your organization's core semantic model which is what MDM has been trying to create all along. But instead of forcing this model directly onto every system, the MAP documents it as a reference standard. For instance, you define that a "Customer" has certain essential attributes like identifier, legal name, and status. This becomes your semantic North Star, but not your operational straightjacket.
The Implementation Layer is where MAPs diverge from traditional MDM. Instead of forcing every system to use the canonical model exactly, you document how each system implements these concepts. Your CRM might extend "Customer" with interaction history and lead scores, while your financial system adds credit limits and payment terms. The MAP doesn't fight these differences—it documents them as legitimate variations of the canonical concept.
The Translation Layer provides the semantic crosswalks that maintain meaning across systems. When the CRM's "prospect" becomes the financial system's "credit-pending customer," the MAP contains the rules that preserve semantic meaning during this transformation. While crosswalks support the arduous task of field mapping, a MAP is able to preserve the meaning of mapped fields using machine readability and encoded definitions of mapped fields.
Solving for Data Quality Through Semantic Precision
MAPs address this by establishing what I call "semantic contracts" at multiple levels. Much like a data contract, a MAP is able to accommodate variations in semantics while supporting agreed upon terms for data use and production. In fact, a data contract works beautifully with MAPs, operationalizing agreements, conditions, data types and formats.
For example, at the field level, a MAP specifies that a date exists, how it should be encoded (ISO 8601), what timezone assumptions apply, and what business rules govern its validity. When your order management system says an order was placed on "12/01/2024
," the MAP ensures every consuming system knows whether that's December 1st or January 12th, whether it's UTC or local time, and whether it includes or excludes the timestamp. This may seem like a no brainer, but believe me, I recently worked with database systems where there was zero conformance within a single database, and in most tables, I was lucky to even find a date for a record or row. (shudder).
Referring back to our example, notice how the CatChums MAP handles video duration. It's not just stored as a number—it specifies ISO 8601 encoding, distinguishes between display formats (code vs. text), and maintains relationships to related concepts like file size. This level of semantic precision is exactly what's missing in most enterprise data architectures. And if you are working with a system where this level of precision and accuracy exists, consider yourself blessed and way ahead of the curve.
Threading Semantics Through Encoding Structures
Crosswalks are perhaps the most under-appreciated component of metadata application profiles. They're essentially mapping tables that show how elements in one metadata schema correspond to elements in another. Crosswalks in a MAP extends beyond field mappings—they preserve semantic relationships across systems by including the RDF, XML and JSON interpretations of relations. Think of a crosswalk as a framework or recipe for metadata transformers, enabling direct translations across corresponding or equivalent fields.
Applied to enterprise data, teams and systems are permitted to maintain individual metadata schemes and naming conventions while the MAP crosswalk enables dynamic reconciliation of metadata. An MDM system might define a customer using one schema, a CRM may use another schema, and your data warehouse yet another. A properly designed MAP with crosswalks ensures that when MDM updates a customer's status from "prospect" to "client," this change propagates with the correct semantic meaning, aligning through the MAP to negotiate meaning with every connected system.
Traditional MDM fails at interoperability because it tries to force semantic uniformity, an unrealistic strategy where the social aspects of naming things typically depart from uniformity. Language is unduly political, but also a sign of healthy doses of human agency and autonomy. MAPs succeed because they enable what we will call "semantic polymorphism"—the same concept can have different valid representations in different contexts while maintaining its essential meaning.
Governing Semantics Through Structured Metadata
Metadata application profiles also serve as governance frameworks that enforce semantic consistency without stifling innovation. Governance operates through three mechanisms:
Semantic Precision
Each element receives an unambiguous definition that eliminates interpretation variance. When the profile defines dc:creator as “the individual or organization with primary responsibility for data accuracy,” there’s no confusion about whether this means the technical team that built the system or the business owner who validates the data.
Controlled Evolution
Changes to semantic definitions follow formal processes. New meanings require new elements rather than overloading existing ones. For example, if using the Dublin Core element relations, there may eventually be a need to capture many different types of relations. To support schema expansion and complexity, we can implement Dublin Core subelements and/or qualifiers. Schema evolution should be supported by the logical reasoning model of the underlying metadata ontology and encoding scheme.
Notice the relationship types detailed in this image, that can add context and clarity to the types of element relations supported.
Enforcement Mechanisms
A MAP includes validation rules that prevent semantic drift. These same rules can be specified as hard constraints detailing allowable values, obligation key and requirement key within database schemas and serialization frameworks.
Practical Implementation for Modern Data Stacks
So how do we adapt this library science concept to modern data architectures? Let’s dive into a practical approach:
Start with a Core Schema
Define your organization's fundamental data elements—not just their names and types, but their semantic meaning and relationships. Dublin Core has a great starter pack of fifteen core elements that is easily applied and adaptable for most any organization. Pick and choose what is useful and riff off of this spec to derive the most meaningful element set for metadata coverage.
Document System-Specific Profiles
For each major system (MDM, CRM, data warehouse, BI tools), create a profile that shows how it implements the core schema. Include encoding rules, validation constraints, and transformation logic.
Build Comprehensive Crosswalks
Create detailed mappings between all profiles. These should handle not just one-to-one field mappings but also complex transformations (like how a single MDM field might map to multiple BI dimensions). Most organizations have more than one metadata system and often metadata can vary wildly team to team.
I am not suggesting one MAP to rule them all—perhaps your organization has each system’s metadata owner create a MAP that is submitted to a central entity. I worked at one organization where my team managed crosswalks for the entire company, with a requirement to submit a revised MAP quarterly. There are many ways to handle crosswalks. Trust me—crosswalks are extremely valuable for connecting data and semantics across an enterprise.
Establish Controlled Vocabularies
Define authoritative controlled vocabularies for language, labels and tags. Like the cat breeds in our MAP example, these ensure consistency while allowing for system-specific extensions. This becomes your input for your value vocabulary.
According to a W3C Library Linked Data Incubator Report, a value vocabulary “defines resources (such as instances of topics, art styles, or authors) that are used as values for elements in metadata records...They are "building blocks" with which metadata records can be populated... A value vocabulary thus represents a "controlled list" of allowed values for an element. Examples include: thesauri, code lists, term lists, classification schemes, subject heading lists, taxonomies, authority files, digital gazetteers, concept schemes, and other types of knowledge organization systems.”
Enable Extensibility Through Semantic Evolution
MAPs enable what I think of as "guided evolution." When a new system needs to extend the canonical model—say, adding social media handles to customer records—the MAP provides a framework to logically capture the newly added data type and format. The new attributes are documented as extensions to the canonical model, with clear relationships to existing MAP schema elements. With clearly detailed definitions of all elements and capture of semantic relationships between elements, new data types and formats are incorporated into the MAP, thereby making the newly added metadata available and interoperable.
Version and Govern
Treat your MAP as living documentation. Track changes, maintain lineage, version the MAP and establish governance processes for updates.
Real-World Implementation Patterns
Organizations successfully implementing metadata application profiles follow consistent patterns. If I may lend some advice, do not try and boil the ocean. Each organization has unique needs and nuanced data architectures.
Here’s my advice, take it or leave it :)
Start Small: Begin with a critical business entity (products, customers, or content) and prove value before expanding.
Build Incrementally: Add elements and relationships gradually rather than attempting comprehensive coverage immediately.
Federate Ownership: Let domain experts own their metadata while maintaining central coordination of standards.
Automate Validation: Implement automated checking of profile compliance to ensure consistency at scale.
Visualize Relationships: Create intuitive displays of metadata networks to help users understand connections. A visualization dashboard also provides observability, if well implemented.
Measure Impact: Track metrics like search effectiveness, integration success rates, and data quality scores.
The Path to Semantic Coherence
MAPs offer organizations a proven methodology for creating unified data ecosystems. By adopting library science principles—formal element definitions, controlled vocabularies, systematic crosswalks, and structured governance—enterprises can transform fragmented data landscapes into coherent knowledge networks. The MAP servse simultaneously as technical specifications that systems can execute, governance frameworks that enforce policies, quality standards that ensure consistency, living documentation that evolves with needs and semantic roadmaps that guide integrations.
Conclusion
When MDM systems and semantic layers unite through properly constructed metadata application profiles, the result is not just better data management but a fundamental transformation in how organizations understand and leverage their information assets. The path forward requires recognizing that metadata management is not merely a technical challenge but a semantic discipline requiring the rigor and systematic approach.
The disconnect between MDM and semantic layers isn't just a technology problem—it's a semantics problem. We've been trying to solve it with more ETL pipelines and more integration tools, when what we really need is a semantic bridge.
Metadata application profiles provide that bridge. They're not a new technology to purchase or a platform to implement. They're a methodology—borrowed from librarians who've been managing complex, distributed metadata for decades—that can transform how we think about semantic consistency in modern data architectures.
Thanks for your real-world implementation patterns. A lot to think through.
Understand now, a lack of metadata management + a controlled vocabulary is a major root cause of “non-productive” semantic debates while incurring tech debt.