Introduction
Metadata is the lowest common semantic denominator in most data ecosystems. Metadata, put simply, is data about data. Metadata can manifest in a multitude of ways—as simple text literal labels, tags applied to resources and documents and even as database schemas that define rows, columns and tables in relational databases. But how do we operationalize metadata to derive semantic context and meaning from metadata? The library and information science domain has been working on refining, iterating and evolving operationalized metadata in digital ecosystems since the advent of computer systems and digital environments with the goal of improving findability, search and discovery for humans and machines.
A Systems Thinking Perspective
You may notice that I always incorporate historical backgrounds and narratives in my writing. I cannot imagine designing, modeling or building a system without understanding the historical background of an idea, concept or technology. My undergraduate degree is in history, so perhaps I am a little biased in believing that history is very much a part of systems thinking. For example, most discussions about metadata and metadata systems do not build upon prior art, the historical evidence and work that preceded an idea or technology. In the new, AI-infused information technology movement, metadata discussions tend to revolve around data catalogs, lexical and text labels and API services.
As far as any enterprise technologist is concerned, metadata emerged when computer systems emerged, shallow lenses into the metadata systems. Understanding the centuries-old evolution of metadata brings a systems thinking approach to how to design and build metadata systems capable of transmitting rich descriptive semantics. In other words, to treat metadata as a holistic data model, capable of interoperability and machine readability.
The Enterprise Metadata Conversation
Enterprise data conversations and proposed solutions revolve around solving immediate metadata needs, hot fixes and shiny new technologies, decoupled from the rich history of metadata and relevant systems. To further understand the evolution and foundations of metadata, it is critical that we study the history of metadata, fundamentals of semantics and principles of organizing, to make sense of how to build robust, operationalized metadata systems that scale and extend. And the very concept of a system, is at the heart of understanding semantic, interoperable, extensible metadata. Which is why systems thinking is necessary, to unlock the value of metadata.
The NIST CSRC definition of a system is useful to level-set our understanding of a system: “ A discrete set of resources organized for the collection, processing, maintenance, use, sharing, dissemination, or disposition of information.” By way of this definition, we can further understand the critical importance of a data model to substantiate and proliferate metadata as a first-class data asset.
From their 2015 paper, A Definition of Systems Thinking: A Systems Approach, authors Arnold and Wade work through meta definitions of systems thinking, to arrive at three distinct qualifiers:
elements (in this case, characteristics)
interconnections (the way these characteristics relate to and/or feed back into each other)
a function or purpose
Notably, the least obvious part of the system, its function or purpose, is often the most crucial determinant of the system’s behavior. (Arnold & Wade, 2015, p. 670) Most enterprise systems do not treat metadata as a system of elements, interconnected and designed to serve, with function and purpose.
Often metadata, in modern business-oriented data ecosystems, are reserved for the semantic layer, data catalogs and business glossaries, to be managed by master data management programs with general oversight from a data governance entity. Enterprise metadata is often treated as a byproduct of, or an appendage of a data system, rarely integrated holistically as a logical system and data model. Failing to treat metadata as a holistic system and data model has led to disjointed systems, unable to reconcile disparate metadata elements and incapable of transmitting interoperable semantics, rich with context and meaning.
If metadata serves as a translation layer and functionally, works to translate the syntactic into a common business language, then why aren’t we designing metadata as a system? How do we expect to build for semantics if we are not treating metadata as a holistic system that is both scalable and extensible?
The Library Science Approach
Library science does not evangelize one single metadata standard or one schema to rule them all. Quite the contrary. Library science takes a more holistic approach, realizing that there are base, generalized data fields or elements that can support robust semantics, while also enabling machine readability and interoperability. Essentially, librarians and information scientists operationalize metadata through frameworks, standards and formats so that lexical labels and descriptions are not uncoupled from digital infrastructures frameworks. This is because all library metadata systems are declared data models, woven into the fabric of digital ecosystems, and fundamental to the principles of information retrieval.
From clay tablets to library card catalogs to the web, metadata has persistently been utilized to support access to information and findability. The Library of Alexandria tied tags to scrolls with titles, authors, and subjects, a practice that foreshadowed modern surrogate records and later card catalogs. Libraries have long treated metadata as the scaffolding that makes collections intelligible and findable.
Standardizing the Record Format
Early inventories and indexes—simple lists of holdings—gave way to card catalogs that let users search by author, title, or subject, and to shared rules that made records consistent across institutions. Released in 1967, the Anglo-American Cataloguing Rules (AACR, later AACR2) standardized what to record and how to phrase it, so “Mark Twain” and “Samuel Clemens” could be collocated and users could get comparable results anywhere. In effect, libraries paired disciplined description with navigable data models to turn shelves of items into searchable knowledge systems.
Digitization pushed analog practices into computer systems and later, onto the web. To answer the demands and opportunities presented by newly founded digital systems, Henriette Avram , with support from the Library of Congress, developed the standardized MARC (Machine Readable Catalog) from 1965-1968. Catalog records were transformed into machine readable formats, powering the development of integrated library systems. However, MARC’s library-specific structure traveled poorly outside of library catalogs.
FRBR-ize It
Alongside the previously mentioned metadata standards and data models, the FRBR model (Functional Requirements for Bibliographic Records) was published in 1998 by the International Federation of Library Associations and Institutions (IFLA) , to address the management of relationships between resources and resource provenance. A conceptual, entity-relationship model, FRBR reframes bibliographic metadata through linked entities, abstracted as Work, Expression, Manifestation, and Item, a departure from single flat records. (OCLC)
As a data model, FRBR enables clustering of expressions and manifestations under works and became the conceptual foundation for RDA (Resource Description and Access, described below). (OCLC). The FRBR legacy and its evolution continued in the IFLA Library Reference Model (2017), which consolidates FRBR with FRAD and FRSAD, to guide contemporary metadata design. (IFLA)
Carrier Types and Linked Data
RDA, the modern descriptive cataloging standard released in June 2010, was first developed in 1997, to replace AACR2 and is maintained collectively through international professional library associations and committees. Rooted in FRBR, FRAD, and FRSAD and aligned with IFLA’s Library Reference Model, it provides element sets and instructions designed for user-focused, linked-data applications.
Practically, RDF-based RDA Vocabularies and the RDA Toolkit have supported wide adoption, enabling interoperable, machine-readable bibliographic metadata across institutions and the web. RDA also provided ways to accurately categorize ever expanding carrier types. Note: in metadata, a carrier refers to the physical or digital medium on which information is stored or transmitted, such as a book's paper, a CD, a DVD, or an online server.
In 2012, to further improve openness and interoperability, libraries developed BIBFRAME as a web-friendly carrier, and adopted the Resource Description and Access (RDA) standards and format as an updated content standard. Aligned with RDF and linked data, where people, works, and topics are modeled as interlinked entities, BIBFRAME sought to transform bibliographic catalogs and records to operationalize metadata beyond the walls and boundaries of the Integrated Library System.
Metadata Schemas Built for the Web
With this shift from analog to digital systems, librarians also designed domain appropriate standardized metadata schemas such as Dublin Core and VRA Core, to further support the findability of library and cultural heritage resources in modern digital ecosystems. VRA Core is a community-developed metadata standard from the Visual Resources Association for describing works of visual culture—art, architecture, and cultural artifacts—and the images that document them.
Historical Retrospective
This high level historical tour only grazes the compendium of metadata practices and underscores a core truth: library metadata is neither a straight path nor a break from the past. Instead, the field advances by refactoring proven principles into new application profiles, ontologies, and exchange models. Carrying forward lessons from card catalogs to MARC, from FRBR to RDA and BIBFRAME, so systems thinking stays anchored in continuity.
Semantics, Metadata and Library Science Core Values
Librarians continually iterate on schemas, element sets, and governance to meet new retrieval needs, networked architectures, and AI contexts, without discarding the intellectual scaffolding that brought us here. The story is one of heritage fused with invention—methodologies and organizational models that rise with innovation and progress. Never static, always dynamic.
To abandon the historical continuum of metadata is to create disjointed systems, absent of standards and protocols, void of concrete data models. Metadata schemas, frameworks, standards, repositories and data models seek to intentionally break down silos while upholding principles of organizing, always with a librarian’s service-minded approach. ALL users are users, and therefore all systems are designed for all users, be it human, machine or even, AI.
The Metadata Diff
Librarian metadata approaches stand in sharp contrast with enterprise metadata approaches. I think we can agree that the enterprise metadata management landscape has no shortage of tools and solutions, because let’s face it, the business technology world is straight up enterprising!
Businesses are built with high levels of modularity, focused upon products, platforms, features and services. More often than not, businesses search for metadata solutions as an afterthought, in response to findability challenges and mismanagement of messy data.
Now that AI demands context and meaning, semantics has placed a more urgent demand for semantic metadata. Data without metadata is bringing organizations to their knees, for lack of context and meaning in a sea of syntax-first data ecosystems. AI is not performant with data alone. Metadata is, after all, a way to describe otherwise syntactic data and AI needs this translation layer to make sense of data. Hence, the myopic focus on business intelligence and data analytics functions such as master data management (MDM) and the semantic layer.
Where We Diverge
Mission and Time
Libraries exist to maximize discovery across collections that live for decades, so they invest in durable, conceptual models (FRBR/LRM, RDA, BIBFRAME) that preserve meaning over time. Enterprises optimize for short-term product, analytics, or compliance needs, so metadata is often “just enough” labels tied to current systems, not a domain model.
Interoperability Incentives
Libraries must share records across institutions (OCLC, union catalogs, authority control), which forces common data models. Most enterprises don’t exchange rich descriptive metadata beyond a vendor or team boundary, so the incentives to model semantics are weaker.
Standards and Governance Culture
Library science has a mature stack (content standards, carrier formats, conceptual models, controlled vocabularies) and distributed governance. Enterprise metadata is usually owned by individual apps, data warehouses, or catalogs (MDM and BI), where schemas and ownership trumps semantics. In fact, enterprises are still grappling with who should even own the semantic layer or standardized semantics. Data engineering? Marketing? Perhaps we should glean some insight from Ole Olesen-Bagneux’s new book, Fundamentals of Metadata Management, where he proposes the creation of a data discovery team to mitigate the conflicting metadata interests that traditionally exist within organizations. Realizing organizational dysfunctions gives way to new ways of operating.
Tooling and Skills
Library toolchains were built around structured description from the start, ontologically rich and centered around machine readable formats that maximize opportunities for interoperability and imbuing semantics. Enterprise stacks evolved around transactions and analytics, so metadata tends to be columns and rows, JSON, and tags—useful, but rarely a first class, semantics first architecture. Enterprise metadata often exists as text literal, lexical labels, appended to data files but barely interoperable and not reliably, extensible.
Conclusion
Some sectors do model metadata deeply: pharma with the Clinical Data Interchange Standards Consortium (CDISC), finance with The Financial Industry Business Ontology (FIBO), e-commerce with GS1and schema.org, manufacturing with the Product Lifecycle Management collection of related standards. In no way am I discounting the work of industry standards, created out of necessity and regulatory affairs. Rather I am shining a light on what is missing from metadata management in most enterprise organizations, namely, a cohesive semantic data model. Every organization can benefit from a pragmatic bridge, by adoption of purpose-built metadata frameworks such as application profiles and a SKOS-like “Knowledge Graph Lite” type of semantic data model. And by so doing, a metadata first data model stabilizes terms as URIs, governs labels and relationships, and lets systems map to the model—bringing the library playbook to enterprises without boiling the ocean.
Next up: We will dive into more about metadata frameworks and data models in the next episode as Metadata as a Data Model.