From Metadata to Meaning: The Knowledge Infrastructure

May 21, 2025

“The greatest enemy of progress is the illusion of knowledge.”

– John Young, Astronaut

Introduction

Knowledge is dynamic, not static. Knowledge, by definition, is relative to human beings— emerging through brain development, lifelong learning, and the storage of perceived facts in memory. An extended definition of knowledge frames it as a characteristic of social and cultural entities—manifesting in forms such as collective knowledge, knowledge economy, institutional knowledge and knowledge bases. The term knowledge is also used to describe job roles such as knowledge engineer and more broadly, knowledge worker. In his 1959 book Landmarks of Tomorrow, business consultant Peter Drucker coined the term knowledge worker, to describe white-collar workers. In his 1999 Harvard Business Review article, Drucker explains, “...that when people perform the work they are good at and that fits their abilities, they can not only cultivate a more successful career in the knowledge economy, they can ultimately bring more value to the organization.” IBM Education, 2023).

Given the ubiquitous nature of knowledge, one can argue that knowledge management is central to human existence and a foundational element of society. Libraries, archives, and cultural institutions are examples of investments in the human brain trust—preserving cultural memory and supporting scholarly endeavors such as research and innovation. However, a visit inside most enterprise organizations reveals that knowledge and knowledge management are often absent or treated as an afterthought. This harsh reality has been magnified by the recent rise of artificial intelligence (AI), otherwise known as the fourth industrial revolution. Businesses and organizations are quickly discovering that, despite digital ecosystems rich in data, they lack the essential ingredients that AI requires to excel.

**Source: https://tinyurl.com/mtmayscu**

Organizations are scrambling to capture structured knowledge to satisfy the demands of AI, desperate to obtain some semblance of knowledge— no matter the cost. The solution is clear: invest in information and knowledge systems and build a robust knowledge infrastructure. When an organization builds its own proprietary knowledge infrastructure, business operations flourish— with or without AI. By investing in this infrastructure, an organization builds its own institutional memory, a kind of brain trust that makes information and knowledge discoverable and interoperable across business processes, machines and people.

In fact, AI services such as ChatGPT, Claude, Llama and Perplexity share a common reason for their high performance: they all rely on robust knowledge infrastructures. These infrastructures provide the training data, Retrieval-Augmented Generation (RAG) and Retrieval-Informed Generation (RIG) implementations, and reference architectures essential for AI to function. In other words, AI depends on information and knowledge— just like humans do.

The Not-So-Distant Past

Up until the public release of ChatGPT on November 30, 2022, organizations, the World Wide Web, schools, and life as we knew it followed a somewhat predictable trajectory. Within organizations, corporate constructs were built around relational databases and product-specific data needs. Business innovations leveraged machine learning algorithms to design engaging customer experiences, driven by market demand for goods and services.

The pre-AI digital era centered organizational infrastructures around relational databases as the core of business operations. Terms like data infrastructure, data warehouses, data lakes, and data architecture were top priority as programs, applications and platforms were built around these critical data assets. In today’s post-AI digital world, these data infrastructures remain important, but lack the requisite information and knowledge, structured in ways that optimize AI performance and output. As the atomic units of information and knowledge, data alone does not make for information or knowledge.

Identity Crisis

The last three years have triggered an identity crisis for organizations. We’ve witnessed overspending on data storage, cloud computing, and AI subscriptions, as everyone scrambled to figure out this AI phenomenon. Melo Ventures reported that businesses spent $13.8 billion on genAI in 2024—five times the $2.3 billion spent in 2023. KPMG’s most recent Quarterly Pulse Survey found that “68% of leaders will invest between $50-$250 million in GenAI over the next 12 months, up from 45% in Q1 of 2024.”

This historic and projected AI spending is astounding, given that most companies continue to struggle with data issues while ignoring the harsh realities of information and knowledge deserts. Perhaps organizations are wishful, hoping AI will manifest context-rich, structured, and semi-structured information and knowledge? However, the same KPMG report suggests some organizational self awareness. “Quality of organizational data is the biggest anticipated challenge to AI strategies in 2025, according to the vast majority of leaders (85%) followed by data privacy and cybersecurity (71%) and employee adoption (46%).”

Great! So We’re Investing in Data Quality?

If data quality concerns and AI infrastructures are taking center stage in 2025, it seems only natural for organizations to invest in information and knowledge infrastructures. But the confidence and clarity is not there. A February 2025, Gartner report cites that 63% of organizations either lack or are unsure if they have the right data management practices to operationalize AI. A July 2024 survey of 1,203 data management leaders warned: “organizations that fail to realize the vast differences between AI-ready data requirements and traditional data management will endanger the success of their AI efforts.” (Gartner, 2025)

If the stakes are so high, you’d expect leaders and organizations to be laser focused on understanding AI-ready data requirements, and present with elevated logic, clear structure, and strategic alignment in AI strategy rooms. Instead, the vast majority of enterprises are falling into doom spirals, holding out hope for some kind of self-healing solution for rigid, unstructured data ecosystems.

What is AI Ready Data?

Put simply, Ernst & Young describes AI-ready data as “information that is easily combined to form business knowledge. These knowledge assets are used to enhance enterprise AI models, improving AI inference. AI ready data is a higher-value form of data that is used for decisions and actions.” Because information and knowledge management are multidimensional, organizations must mature their data strategies and evolve into knowledge infrastructure strategies—capable of capturing the nuances of business knowledge and rich semantics that give data context and meaning.

**Source: Microsoft Blog https://tinyurl.com/3z59xrza**

An investment in knowledge always pays the best interest

–– Benjamin Franklin

All Knowledge is Not Equal

If knowledge is pervasive and fundamentally human, shouldn’t knowledge management be the primary focus of AI infrastructure and data strategies? The importance of knowledge bases and digital knowledge infrastructures is underscored by AI’s heavy reliance on knowledge sources such as Wikipedia. As noted in a July 2023 New York Times article, “Wikipedia is probably the most important single source in the training of A.I. models.” While not without flaws, occasionally hosting misinformation and disinformation, Wikipedia stands as a massive, crowdsourced knowledge base that should be the envy of every company seeking to leverage AI in products and services. Yet, for some unclear reason, many enterprises are failing to make the connection between their weak knowledge infrastructures and underwhelming AI performance metrics.

The Structure of Knowledge

Knowledge is usually broken down into two main categories, explicit and tacit. Explicit knowledge refers to any data or information that can be documented, stored and shared. Tacit knowledge, on the other hand, is often experiential or intuitive— deeply embedded in human experience and difficult to extract, capture or articulate.

**Source: https://helpjuice.com/blog/tacit-knowledge**

Beyond these common global definitions, Wikipedia notes several more ways to think about and categorize knowledge:

Common knowledge : Knowledge that is known by everyone or nearly everyone, usually with reference to the community in which the term is used.
Customer knowledge : Knowledge for, about, or from customers.
Domain knowledge : Valid knowledge used to refer to an area of human endeavor, an autonomous computer activity, or other specialized discipline.
Foundational knowledge :The knowledge necessary for understanding or usefully applying further knowledge in a field.
General knowledge : Information that has been accumulated over time through various mediums. This definition excludes highly specialized learning that can only be obtained with extensive training and information confined to a single medium. General knowledge is an important component of crystallized intelligence and is strongly associated with general intelligence, and with openness to experience.
Metaknowledge :Knowledge about knowledge. Bibliographies are a form of metaknowledge. Patterns within scientific literature is another.
Mutual knowledge :Information known by all participatory agents.
Self-knowledge :Information that an individual draws upon when finding an answer to the question "What am I like?".
Traditional knowledge : Knowledge systems embedded in the cultural traditions of regional, indigenous, or local communities. Traditional knowledge includes types of knowledge about traditional technologies of subsistence (e.g. tools and techniques for hunting or agriculture), midwifery, ethnobotany and ecological knowledge, traditional medicine, celestial navigation, ethnoastronomy, the climate, and others. These kinds of knowledge, crucial for subsistence and survival, are generally based on accumulations of empirical observation and on interaction with the environment.

Source: https://en.wikipedia.org/wiki/Outline_of_knowledge

While this list of knowledge categories is by no means exhaustive, it helps illustrate the multidimensional nature of knowledge, its interdependence with the human brain and deep ties to people’s lived experiences. Yet our current data infrastructures and enterprise architectures are often oriented around the almighty relational database (RDBMS), not descriptive, structured knowledge. A RDBMS structures data into rows and columns, to form tables. These tables can be joined by primary or foreign keys, to create logical datasets and information. However, table relationships are constrained to three types: one-to-one, one-to-many and many-to-many. This rigid structure lacks semantic, descriptive context.

As a result, many database-oriented organizations continue to struggle with persistent problems like data silos and data quality issues, rather than investing in knowledge management and machine-readable, interoperable knowledge bases. A knowledge infrastructure needs data as critical input, to manage output and for storage. The longer organizations delay the understanding of the new data and knowledge paradigm, the further they fall behind in being able to leverage AI in products and workflows. Most critical to our new, knowledge infused, AI reality are the reimaginings of skills, jobs, roles, tools and architectures, necessary to sustainably codify organizational knowledge and memory.

The Knowledge Problem

Knowledge is inherently a sociotechnical concept, and the absence of thoughtful knowledge management systems within organizations is responsible for a lion’s share of organizational dysfunction. Many organizations exhibit intentional or unintentional knowledge hoarding practices, which limit access to the critical domain knowledge necessary for job performance and customer success.

The global knowledge management market is projected to reach $2.1 trillion by the year 2030. Yet, a Pryon report shows that 47% of professionals spend 1-5 hours a day searching for specific information. Another 15% spend 6-10 hours doing the same, whereas the same number (15%) spend less than 1 hour finding what they need. Without a knowledge infrastructure, both humans and machines will continue to struggle to find information and knowledge, which is necessary to perform successfully. Because AI relies upon human knowledge, an organization’s knowledge infrastructure becomes the lynchpin, a qualifier for successful organizations in this fourth industrial revolution.

Finally, What IS Knowledge Infrastructure

The concept of knowledge infrastructure is not new. As mentioned earlier, cultural institutions such as libraries and archives are built on both digital and physical knowledge infrastructures. The fact that these cultural institutions straddle the physical and digital worlds speaks to the intricacies of capturing and modeling knowledge, to make data and information findable, interoperable and machine readable. So why not learn from the masters? Librarians and archivists live and breathe knowledge infrastructure—they are specially trained in frameworks and methodologies for knowledge management and access to knowledge, with a service-minded approach.

At the heart of any knowledge infrastructure lies a culture of sharing. Knowledge management’s defining feature is knowledge-sharing—the true engine of innovation. Yet, the success of these initiatives depends on an organization’s ability to encourage participation, dismantle silos and prevent the hoarding of critical insights. Achieving this demands interdisciplinary collaboration, with teams across departments and domains pooling their collective expertise into a shared, evolving knowledge base.

**Source: EY, https://tinyurl.com/3u4czjrn**

Building and sustaining this infrastructure requires more than goodwill; it demands the right blend of tools, processes, systems, and cultural enablers. Intuitive collaboration platforms, formalized taxonomies, metadata-driven repositories, ontologies and clear governance frameworks streamline the capture, transfer, and reuse of knowledge. By embedding these sharing mechanisms into daily workflows, organizations create continuous, cumulative learning loops to diffuse expertise broadly, enhancing intellectual capital and securing a lasting competitive edge in a rapidly evolving AI landscape.

**Source: https://dataedo.com/cartoon/data-vs-metadata-8**

Knowledge Infrastructure As-a-Program

Because knowledge infrastructure establishes institutional knowledge networks and codifies institutional memory, it must be treated as a program, built and managed as a core business function and AI enabler. The knowledge infrastructure composite includes:

Creators: Those who generate knowledge (researchers, experts, content authors, data collectors)

Products: The formal outputs of knowledge (e.g., documents, datasets, models, features apps, platforms)

Distributors: Systems and platforms that make knowledge available (wikis, repositories, APIs)

Disseminators: Communicators and interpreters (educators, marketers, dashboards, wikis, glossaries)

Users: Individuals or systems that apply the knowledge (decision makers, AI agents, knowledge workers, learners)

Derived from Foundations of Library and Information Science by Richard E Rubin, 2016

**ChatGPT (o4-mini-high) generated image**

Like network architectures, the backbone of almost every digital business, a knowledge infrastructure is designed and built to sustain all business needs and practices. A solid knowledge infrastructure is scalable and extensible,capable of supporting the complex intersections of data, information and knowledge.

The Knowledge Infrastructure Core

A knowledge infrastructure cannot exist without context-dependent semantics. Semantics extend beyond simple text labels. Glossaries, catalogs, metadata, controlled vocabularies, taxonomies, thesauri, ontologies and knowledge graphs form rich semantic systems.

Semantics facilitate:

Harmonizing data from multiple systems
Enabling context-aware search and discovery
Supporting automation and decision making with machine understanding
Retrieval Augmented Generation (RAG) and Retrieval Interleaved Generation (RIG) implementations for AI systems
Promoting a shared understanding of data and vocabularies across domains and products

Source: ChatGPT o4-mini-high) generated image

Semantic systems enable infrastructures by creating shared vocabularies that must be accepted and implemented by a range of stakeholders — from product teams and data scientists to business analysts and executives. Semantics is inherently sociotechnical, and relies on human interaction to be viable as a shared understanding of a domain or world.

Knowledge as Infrastructure

“What makes knowledge desirable and worth cultivating is the enhancement it brings to the effectiveness with which we operate in an artifactual environment. Knowing how and knowing that are not different kinds of knowledge. They are different kinds of use for different artifacts, all expressing the only kind of knowledge there is: a human capacity for superlative artifactual performance.”

`Source: https://tinyurl.com/27jhtmm5`

We know that data alone, the raw, unprocessed records of events, transactions, and observations, is insufficient to power truly intelligent systems. Instead, AI requires structured, contextualized, and semantically rich knowledge to interpret, reason, and generate reliable insights.

Why Data Alone Isn’t Enough

Data can be plentiful and even big, but without context it remains inert. Consider a spreadsheet filled with timestamps, user clicks, or sensor readings — valuable for analysis, but meaningless on its own. Knowledge, by contrast, embeds context: relationships between entities, causal links, definitions, and constraints that give data purpose and direction. In the AI domain, this layered understanding allows models to transcend pattern recognition, allowing them to generalize from examples, disambiguate terms, and apply reasoning across domains.

Moreover, AI systems operate best when they can tap into curated ontologies, taxonomies, and knowledge graphs that encode domain expertise in machine-readable form. These structures transform disparate data points into interconnected knowledge networks—what IBM terms “AI-ready” information that augments model training, supports retrieval-augmented generation, and improves inference quality. Without these frameworks, AI risks producing superficial or erroneous outputs, simply regurgitating patterns found in noisy or biased datasets.

Common Data Problems in AI

Organizations often discover that their “big data” repositories harbor serious issues that undermine AI projects. Common challenges include:

Incomplete or Inaccurate Data
Models trained on datasets plagued by missing fields, errors, or mislabels underperform in production. Cleaning and validating data at scale remains a major bottleneck, fueling an entire industry of data preparation tools.
Excessive Noise
More data isn’t always better. Large volumes of irrelevant or low-quality entries can drown out signals and steer AI toward spurious correlations, leading to overfitting or unpredictable behaviors.
Data Silos
Fragmented stores of information—hidden within departmental applications, legacy databases, or personal spreadsheets—prevent a unified view of organizational knowledge, hindering model retraining and cross-functional AI initiatives .
Bias and Lack of Representativeness
When training data reflects historical inequalities or skewed sampling, AI perpetuates those biases, eroding fairness and trust. Mitigating bias requires deliberate efforts to identify, audit, and rebalance datasets.
Rapid Staleness
In dynamic environments—think news feeds, financial markets, or social media—data can become outdated quickly. AI systems need mechanisms for continuous updates, versioning, and temporal context.

“Having knowledge but lacking the power to express it clearly is no better than never having any ideas at all.”

– Pericles

The Domain-Specific Knowledge Gap

Even well-structured, high-quality data may leave AI systems floundering when confronted with specialized or emerging domains. A notable example occurred when generative AI models mis-answered simple questions about Tiger Woods’ career, revealing gaps in their sporting knowledge despite vast training corpora. Enterprises in sectors like healthcare, finance, or manufacturing often find that pre-trained models lack the depth of domain-specific terminology, regulations, and workflows needed for practical applications.

Bridging this gap demands integration of proprietary knowledge—standard operating procedures, regulatory guidelines, technical manuals—into AI pipelines through fine-tuning, retrieval-augmented generation (RAG), and custom knowledge bases. While these approaches can be resource-intensive, they ensure that AI outputs align with real-world requirements and reduce the risk of critical errors in high-stakes environments.

Building a Knowledge Infrastructure for AI

To overcome data pitfalls and domain gaps, organizations must invest in a robust knowledge infrastructure comprising:

Semantic Layering: Employ ontologies, taxonomies, and metadata schemas to define concepts and relationships.
Knowledge Graphs: Develop graph databases that interlink entities such as products, processes, and policies, enabling AI to traverse knowledge networks.
Document and Content Management: Index and annotate unstructured sources—reports, emails, manuals—so AI can retrieve contextually relevant passages.

Governance and Quality Controls: Establish processes for ongoing data curation, versioning, and provenance tracking, ensuring that knowledge assets remain accurate and up to date.
Integration with AI Workflows: Embed knowledge services into model training, inference, and evaluation pipelines, using APIs or microservices to provide semantic enrichment on demand.

By aligning people, processes, and platforms around these components, organizations cultivate an “institutional brain” that supports AI as well as human decision-making, onboarding, and innovation.

Conclusion

In the age of AI, data alone is merely raw fuel—valuable but inert without context. Knowledge infrastructure is the vital engine, transforming fragments of data into structured and semi–structured, contextualized and semantically meaningful assets that empower AI systems to perform their best. Without the foundation of structured, contextualized, and continuously managed knowledge assets, AI initiatives face significant risks: poor input quality, domain mismatches, and opaque reasoning.

Building a robust knowledge infrastructure means more than technology— it requires an intentional investment in semantic frameworks, knowledge graphs, and governance to create an institutional memory and collective brain trust. This infrastructure not only enhances AI’s ability to learn faster, reason deeper, and generate more trustworthy insights to human knowledge, but also drives human decision-making, collaboration and innovation.

Organizations that fail to recognize this will fall behind, stuck in information deserts.

Ultimately, organizations that invest in building rich semantic layers, knowledge graphs, and mature governance frameworks unlock AI’s full potential, enabling models to learn faster, reason deeper, and deliver trustworthy outcomes across ever-evolving business landscapes. Equally important, they align people – human knowledge and innovation, recognizing that it is humanity and the preservation of our collective knowledge that truly drives AI’s promise and power in the fourth industrial revolution.

All rights reserved, Jessica Talisman, ©May 21, 2025. All content written by the author, including text, graphics, audio, and video, is original work protected by the Copyright Act. Unauthorized use or duplication for commercial purposes is strictly prohibited.

Intentional Arrangement

Discussion about this post