Language is the Bridge
There is a reason we call them language models.
It is the kind of sentence that sounds obvious until you sit with it and look around your digital environments. Large Language Models. Not large knowledge models or large reasoning models or large world models. Language is the substrate and the bridge, the surface with which we interact with systems and interfaces. Strip the language away and what remains is a tensor of weights with no bearing on the real world.
Ontologies work the same way as anything else within digital systems. Both LLMs and ontologies are formal systems whose intelligibility — to humans, to machines, to one another — depends on language. Strip the linguistic scaffolding from either and you sit with the same problems—fluent, well-formed, internally consistent output that nobody can act on, because the meaning is uncertain.
This is the engineering discipline of building the only interoperability layer for humans and machines and the one we cannot avoid. When missing, every system can render as untrustworthy, and untrustworthy systems are systems nobody can act on. Ultimately, knowing has to translate into doing, and shared language is the only translation device we have.
The Octopus and the Ontologist
In 2020, Emily Bender and Alexander Koller published what is now called the octopus paper — a thought experiment in which two people, A and B, are stranded on separate uninhabited islands and communicate through an undersea telegraph cable. A hyper-intelligent octopus taps the cable, learns the statistical patterns of their exchanges and eventually cuts the line to impersonate B in conversations with A.1 For a while, A notices nothing. Then A is attacked by a bear and messages B asking how to fashion a weapon from sticks to defend herself. The octopus — having never seen a bear, never held a stick, never inhabited the world its symbols described — produces fluent nonsense.
Bender and Koller's argument is that form is not meaning. A system that has only ever seen the form of language has not learned what language is about and how words are context dependent while also being dependent upon real world experiences. In the six years since the octopus paper was published, the techniques that have measurably improved language model output — retrieval augmentation, tool use, verified citation, knowledge graph integration — all work to connect the model to a source of truth outside its own training distribution. They tie words back to the things words refer to.2
Now consider the ontologist’s version of the octopus. An OWL ontology file, beautifully formed. Classes, subclasses, properties, restrictions, disjointness axioms—everything works as designed. The file validates and reasons. A description-logic engine will happily compute its inferences. But it refers to nothing humans can use, because nobody ever did the work of binding its terms to a controlled vocabulary, a taxonomy reconciled against actual organizational usage or a thesaurus that captures how domain experts differentiate renal failure from kidney injury or from nephropathy. The ontology is internally valid yet externally inert. This is the equivalent of the octopus chatter as it relates to well intentioned ontology deployments.
Decisions go unsupported because the ontology tells you nothing you can act on, because the bridge to human practice and language is absent.
Labels Are Not Decorations
With ontology engineering , it is easy to forget that labels and language are essential to any functioning system, especially one that is designed to work with LLMs.
A class with the URI http://example.org/ont/c_47291 and no rdfs:label, skos:prefLabel, skos:altLabel or rdfs:comment can easily exist as an opaque token, describing a thing but lacking the human language that provides necessary signals as to the true meaning of things. A description logic reasoner can manipulate but a human cannot. A domain expert cannot validate the meaning and a downstream developer cannot map it to a database column without human and machine readable labels. A clinician may not be able to recognize whether it represents the condition they just diagnosed.
The W3C OntoLex Community Group perfectly elucidates this point by stating that a label “provides a lexical anchor that makes the concept, property, individual etc. understandable to a human user,” and the simple-string label that OWL and RDFS provide is “far from being able to capture the necessary linguistic and lexical information” that human and machine users actually need.3
The Lucie-Aimée Kaffee analysis of Wikidata — the largest open knowledge graph on earth, with roughly 121 million items further elaborates that “Labels are the way for humans to interact with the data.”4 Wikidata enforces, as a hard constraint, that the combination of a label and its description must be unique in any given language. Without that uniqueness, two humans looking at the same Wikidata QID will draw different conclusions, the consequence being that the system fails to support coordinated action.
Empirically, controlled vocabulary and label work are the single largest determinant of whether an ontology is usable. The OOPS! pitfall scanner — built from analysis of nearly seven hundred real ontologies — codifies “missing annotations” as pitfall P08, and field studies routinely find ontologies with thousands of P08 violations apiece.5
In clinical terminology, Elkin and colleagues showed that SNOMED CT’s coverage of real problem-list strings rose from 51.4% to 92.3% when synonymy and term composition were properly populated — a forty-one-point gain produced entirely by the lexical layer, with no change to the formal axioms.6 In ontology matching, the OAEI bake-off has reported for years that fully automated alignment plateaus and that the lift past that plateau comes from interactive matching — humans reviewing and correcting label-level mappings. 7
Most importantly, labels are where the ontology meets the human. And they are the surface upon which decisions get made. Everything else in the formal model is downstream of whether someone can read a label and decipher what was meant my the label’s words.
The Layered Linguistic Agreement
Once you see labels as the human interoperability layer, the rest of the Ontology Pipeline™ reveals itself as the structured engineering of agreement on those labels at every scale.
Controlled vocabularies exist because the NYC / New York / NY / N.Y. / new york city problem is not a data-cleaning issue, it is an agreement issue. Vocabulary control is the act of saying, “in this system, this string means this thing, and these other strings are aliases for it, and these other strings refer to something else”. Birger Hjørland makes the case explicitly, by stating that vocabulary control is not bureaucratic tidiness, it is the precondition for any subsequent semantic operation.8 Without it, you cannot count, join, align, reason and most importantly, people cannot agree on what they are talking about, which means they cannot decide together, and they cannot act together.
Taxonomies exist because flat vocabularies do not scale, and because hierarchy is the cheapest, most cognitively native way humans organize categories. Eleanor Rosch’s prototype theory established that human categorization is graded around basic-level categories, not built from necessary and sufficient conditions.9 Taxonomies that survive contact with users and serve as living structures for meaning, encode this through parent—child relations. They elevate the privilege of meaning to the cognitive level , where at humans reach for words for sensemaking. A taxonomy that ignores how domain experts define their worlds is a taxonomy nobody will use, which means it is a tree of dead strings, and therefore cannot ground decisions in reality.
Thesauri exist because synonymy, near-synonymy, hierarchical relationships and broad relations have to be made explicit or they will be violated. The SKOS specification formalizes the thesaurus relationships — broader, narrower, related, preferred label, alternate labels and documentation properties. ISO 25964 has defined the conditions and methodologies used to safely transmit labels and associated meaning for thesauri within digital systems using SKOS as the vehicle.10
Crucially, SKOS does not claim to be an ontology although technically, it is an OWL Full ontology. SKOS is technically an RDF data model. It is the layer between vocabulary and ontology — the layer where meaning is negotiated before it is formalized. This is the layer almost everyone wants to skip. It is also the layer that determines whether, when somebody searches for “renal failure” and somebody else writes “kidney injury,” the system recognizes them as related. This relatedness expressed using RDF supports inference, which is a powerful utility with LLMs and for search.
Also of import, by way of a SKOS vocabulary, each label or word becomes a concept. The vocabulary, taxonomy and thesauri, if built using SKOS, becomes the conceptual model—a critical phase in ontology design and construction. While all concepts may not qualify as ontology classes, properties or relations, the concepts do inform ontology classes and properties, ensuring informed building. And ideally, a person in a lab coat in a high tower is not designing an ontology in a box, disconnected from the real world and language.
Ontologies exist because thesauri cannot express constraints, ontological commitments, and logical reasoning. An ontology says: "a Person is a kind of Agent, and an Agent can hold an OnlineAccount, make a Document, or know another Person" — the kind of light commitment that lets the FOAF vocabulary describe people and their relationships across millions of linked-data documents on the open web.11 It says: "an Activity is something that occurs over a period of time, used some Entity, and was associated with some Agent who bears responsibility for it" — the commitment that lets PROV-O carry provenance chains through scientific workflows, data pipelines, and W3C-conformant systems.12 And it says: "a LegalEntity is the kind of thing that can issue a FinancialInstrument and bear a contractual obligation, and a Regulator stands in a regulates relation to a regulated LegalEntity" — the commitment that lets the Financial Industry Business Ontology (FIBO) carry regulatory reporting, securities master data and risk analysis across banks, exchanges, and supervisors.13
Ontologies encode meaning and commitments using description logic and by establishing rules or constraints. This is what makes reasoning over them possible. Ontologies are falsifiable: they commit to claims specific enough that a reasoner, a domain expert, or reality itself can contradict them.
Most importantly, labels are where the ontology meets the human. And they are the surface upon which decisions get made.
Knowledge graphs exist as an architecture, within which components such as taxonomies, thesauri, metadata schemas, ontologies, reasoners and embeddings can be included. A knowledge graph’s architecture can include more than one ontology, and accommodate various architectural elements based upon the intended output and use cases informing its construction.
The 2021 ACM Computing Surveys survey by Aidan Hogan and colleagues defines a knowledge graph as a graph of data intended to accumulate and convey knowledge of the real world, whose nodes are entities and whose edges are relations.14 The knowledge graph is where the abstractions of the ontology meets the messy particulars of say patient, transaction or part number. Without the preparatory work and agreement around words and meaning, the knowledge graph can become a database with delusions of grandeur. With agreements, the knowledge graph will exist as a resource that gives humans a basis for shared, informed action and meaning.
Every layer extends deeper into a shared agreement about labels, words and meaning. Controlled vocabularies agree on the label sets while taxonomies agree on hierarchical relationships amongst words and labels based upon established, defined meaning. Thesauri extends agreement towards synonymy and association. Ontologies agree on what those defined things and entities commit to, formally. Knowledge graphs host the real world particulars, many of which are wear labels within digital environments and live in wikis, tagging systems, databases, catalogs and assets.
The Ontology Pipeline™ is the engineered scaffolding of human linguistic agreement at scale, working iteratively through the layers to construct the world models of organizations and institutions.
The Knowing-Doing Gap Is a Language Gap
Jeffrey Pfeffer and Robert Sutton, in their 1999 book The Knowing-Doing Gap, asked one of the most thought provoking and critical questions in management research: why do organizations that demonstrably know what to do so often fail to do it?15 Their finding, after years of fieldwork, demonstrates the gap between knowledge and action, and proves that failures are not attributed to any gap in information, motivation or capability. It is a gap in shared understanding. People in the same organization, looking at the same data and working on the same problems, are using terms that mean different things, and the resulting decisions are either paralyzed or wrong.
This is why starting with language and vocabulary is essential, when building ontologies and knowledge graphs. Without agreed upon language and labels, the gaps between knowing and doing are too vast to support decision making and collaborative problem solving.
For example, when two clinicians use “renal failure” to mean different things, they cannot decide on a course of treatment together. Or when marketing’s “active customer” excludes anyone inactive for thirty days while finance’s “active customer” excludes anyone inactive for ninety, resulting in a quarterly forecast that is impossible to define and model ontologically. This mixed meaning and conflicting labels conundrum is all too common, leading to high rates of failures and degredation of knowledge.
When I was working at an edtech company, the entire business was reliant upon publishing pipelines and libraries exceeding 15,000 modularized courses. Across 41 teams, no one could agree upon metadata labels and naming conventions, necessary to indicate the status of a course, throughout the lifecycle of content. To boot, metadata was not required which is mindboggling, leading to duplicate courses, courses without titles, inaccurate compensation for course authors and inability to measure the “freshness” of content. And if you look at the screenshot below, the worst offender was the word for a course being developed and in production.
In Progress, In Review, Awaiting Review, In Process, Author Review, Draft, Editor Review and Ready for Publication ALL MEANT THE SAME THING. The company had a hardcore Domain Driven Design (DDD)culture that bordered on a don’t tread on me attitude, in regards to ownership over ways of doing things and naming conventions per team. These disparities also matriculated into knowledge bases and were not only language problems—they became knowledge problems. Ultimately, people involved could not tell whether they were disagreeing about the world or about the words for things within a shared ecosystem.
This is the primary reason building semantic systems are difficult, particularly ontologies. But it is also the reason the absence of semantics is so expensive, leading to failures in decisioning and informed execution. Shared language is what converts knowledge into doing. Without it, organizations accumulate knowledge they cannot act on. With it, they discover that much of what looks like disagreement was actually translation failure, and the path to decision opens up.
The Furnas, Landauer, Gomez, and Dumais finding is the empirical proof for this argument. In 1987 they ran the now-canonical study on how often two people, asked to name the same familiar object, choose the same word. The probability ranged across domains from 0.07 to 0.18. Mean roughly 0.12. As they wrote, “If one person assigns the name of an item, other untutored people will fail to access it on 80 to 90 percent of their attempts.” And further, “The idea of an ‘obvious,’ ‘self-evident,’ or ‘natural’ term is a myth.”16 Indeed, shared vocabularies are an exercise in diplomacy, and collective efforts towards shared vocabularies and meanings.
Eighty to ninety percent failure is the baseline rate at which two humans, trying to communicate about the same thing fail to converge on a name for it. This is what you are up against when you skip vocabularies, taxonomies and thesauri. This is why “we’ll just use natural language” is not a strategy — natural language, unaided, gives you a one-in-eight chance that any two people will use the same term for the same thing. Furnas and his colleagues proposed a remedy: “unlimited aliasing” — many alternate entry terms, mapped to a single referent.
Furnas and colleagues were proposing, in 1987, what a SKOS thesauri formalizes today. Alt labels are how you climb out of the vocabulary problem. They are how you make it possible for one person’s “renal failure” and another person’s “kidney injury” to refer to the same thing, so that decisions made on top of those terms are decisions made about the same world.
Alt labels are how you climb out of the vocabulary problem.
The European Interoperability Framework — the EU’s official model for how systems and institutions cooperate — defines four layers of interoperability: legal, organizational, semantic and technical. It is unambiguous about which layer is hardest. “Given the different linguistic, cultural, legal and administrative environments in the Member States, this interoperability layer poses significant challenges.”17 The technical layer is solvable with engineering. The semantic layer is the one that requires humans to agree on what their words mean. And as the FAIR data principles authors put it, in the paper that has shaped open science for a decade, humans and machines face distinct barriers in finding and processing data, but the barrier they share is meaning, and the meaning layer is where data becomes usable for either.18
The knowing-doing gap is a semantic-interoperability gap. The Ontology Pipeline™ is the engineering response to it.
The Human Interoperability Layer
There is a hope, persistent and recurring, that machines will eventually do this work for us. That LLMs will read the unstructured corpus of an enterprise and output a clean ontology. That ontology matchers will reconcile any two schemas without human intervention. The dream is that the meaning problem can be automated away.
It cannot because of meaning and its unbreakable bond to human language.
Michael Polanyi observed, in The Tacit Dimension (1966), that “we can know more than we can tell.” He meant that human knowledge depends, irreducibly, on a tacit substrate that resists full articulation. We recognize a face amongst thousands and cannot say how. Physicians diagnose a clinical presentation in seconds and spend hours reconstructing the reasoning. The tacit is not a precursor to the explicit because the tacit is the ground the explicit rests on.19
Harry Collins, building on this in Tacit and Explicit Knowledge (2010), distinguishes a intractable form he calls collective tacit knowledge — the irreducibly social knowledge that lives in the linguistic practices of communities. It is, Collins writes, “strong tacit knowledge” because “we know of no way to describe it or to make machines that can possess or even mimic it.”20
This is the layer the Ontology Pipeline is engineering against. Domain experts know how renal failure differs from kidney injury the way a chess grandmaster knows when to sacrifice a knight. Through years of immersion in a community of practice that has shaped, refined, and contested those distinctions, language with declarative definitions emerge. The label — the preferred term, the alt labels, the scope note, the broader and narrower relationships —is the part of the tacit that can be made explicit, and the part that has to be, if anyone outside that immediate community is going to act on what the community knows.
This is also why fully automated approaches plateau. The OAEI ontology matching campaigns have been documenting it for years. State-of-the-art automated matchers max out below the level a human-in-the-loop system reaches with quite modest interaction.21 Recent LLM-based ontology matchers are successful when the labels are present and where labels are sparse, performance collapses.22 You can build the most sophisticated reasoning system in the world, and if the labels underneath are contested, ambiguous, or absent, the system has nothing to reason *with* that humans can recognize.
Wittgenstein stated “the meaning of a word is its use in the language” forty years ago, well before Pfeffer and Sutton named the knowing-doing gap.23 Meaning is constituted by use, by people and in communities, over time. There is no shortcut and no machine that can do this work, because the referent is not in the world the way a rock is in a field. It is in the practice of a human community, and only humans embedded in that practice can say whether the label fits what they do.
Infrastructure for Action
Let me put this together.
LLMs and ontologies are both formal systems whose reference depends on linguistic scaffolding. Strip that scaffolding from either and you get plausible, probable output that looks coherent but is not actionable. The remediation, in both cases, is to re-couple the system to the human linguistic practices , from which its symbols emerge.
For ontologies, that scaffolding is built layer by layer starting with controlled vocabularies, taxonomies, metadata schemas, thesauri, ontologies and knowledge graphs. Each layer is a deeper, more structured agreement about labels — the surfaces being where humans actually meet the model. Without that scaffolding, the formal artifacts are inert. With it, they support the one thing organizations need from their information systems: the ability to act on what they know.
The knowing-doing gap is what happens when that scaffolding is missing. Knowledge accumulates. Decisions stall. People talk past each other and call it disagreement when it is actually failures in translation. Furnas’s eighty-to-ninety-percent vocabulary mismatch is a daily experience of every cross-functional meeting in which “active user,” “incident,” “compliant,” “ready,” or “complete” turn out to mean five different things The meeting ends without a decision, because language and meaning became barriers to understanding.
More knowledge is not the answer here. It is a shared language, from which to act. The Ontology Pipeline™ is the engineered production of that shared language. It is the universal interoperability layer because every other layer routes through it. Technical interoperability is solvable but semantic interoperability is the work.
This is why the vocabularies and words are so hard, and so expensive to skip. Vocabulary control is what makes informed decisions possible — for humans coordinating with humans, for humans coordinating with machines, and increasingly for machines coordinating with each other through human language as the bridge inbetween.
And perhaps just maybe, that is why we call them language models. And it is also why, when we build ontologies, we do the language work first.
It is the only work that lets anyone, human or otherwise, do something from an informed place.
Controlled Vocabularies, Part I
Intentional Arrangement is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. I appreciate your support ❤️
Footnotes
Emily M. Bender and Alexander Koller, “Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data,” Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), 5185–5198.
For a synthesis of grounding-based remediation strategies, see Lei Huang et al., “A Survey on Hallucination in Large Language Models,” ACM Transactions on Information Systems (2024); Shirui Pan et al., “Unifying Large Language Models and Knowledge Graphs: A Roadmap,” IEEE Transactions on Knowledge and Data Engineering 36, no. 7 (July 2024): 3580–3599; and Darren Edge et al., “From Local to Global: A Graph RAG Approach to Query-Focused Summarization,” Microsoft Research, April 2024 (arXiv:2404.16130).
Philipp Cimiano, John P. McCrae, and Paul Buitelaar, eds., Lexicon Model for Ontologies: Community Report, W3C Ontology-Lexicon Community Group Final Community Group Report, 10 May 2016. The cited passages on labels as “lexical anchor” and the inadequacy of simple-string labels are from §1, “Introduction.”
Lucie-Aimée Kaffee, Alessandro Piscopo, Pavlos Vougiouklis, Elena Simperl, Leslie Carr, and Lydia Pintscher, “A Glimpse into Babel: An Analysis of Multilinguality in Wikidata,” Proceedings of the 13th International Symposium on Open Collaboration (OpenSym ’17), ACM, 2017. Wikidata corpus statistics drawn from Wikidata’s official statistics page (current as of early 2026).
María Poveda-Villalón, Asunción Gómez-Pérez, and Mari Carmen Suárez-Figueroa, “OOPS! (OntOlogy Pitfall Scanner!): An On-line Tool for Ontology Evaluation,” International Journal on Semantic Web and Information Systems 10, no. 2 (2014): 7–34. Pitfall P08 (“Missing annotations”) is documented in the OOPS! online catalogue at oops.linkeddata.es. C. Maria Keet, in “Pitfalls in Ontologies and TIPS to Prevent Them,” documents OBI with 3,771 P08 violations and DMOP with 866.
Peter L. Elkin et al., “Evaluation of the Content Coverage of SNOMED CT: Ability of SNOMED Clinical Terms to Represent Clinical Problem Lists,” Mayo Clinic Proceedings 81, no. 6 (2006): 741–748. The 51.4% → 92.3% improvement is the headline finding.
Ontology Alignment Evaluation Initiative (OAEI), 2024 Interactive Track results, oaei.ontologymatching.org. The OAEI organizers explicitly write that “fully automatic ontology matching approaches slowly reach an upper bound of the alignment quality they can achieve” and document the lift produced by simulated user input.
Birger Hjørland, “What Is Knowledge Organization (KO)?,” Knowledge Organization 35, no. 2/3 (2008): 86–101. See also Hjørland, “Concept Theory,” Journal of the American Society for Information Science and Technology 60, no. 8 (2009): 1519–1536.
Eleanor Rosch, “Principles of Categorization,” in Cognition and Categorization, ed. Eleanor Rosch and Barbara B. Lloyd (Hillsdale, NJ: Lawrence Erlbaum, 1978), 27–48; Rosch et al., “Basic Objects in Natural Categories,” Cognitive Psychology 8, no. 3 (1976): 382–439.
Alistair Miles and Sean Bechhofer, eds., SKOS Simple Knowledge Organization System Reference, W3C Recommendation, 18 August 2009. For the connection to the ISO 25964 thesaurus standard, see Stella Dextre Clarke and Marcia Lei Zeng, “From ISO 2788 to ISO 25964: The Evolution of Thesaurus Standards towards Interoperability and Data Modeling,” Information Standards Quarterly 24, no. 1 (2012): 20–26.
Dan Brickley and Libby Miller, FOAF Vocabulary Specification 0.99 (Namespace Document, “Paddington Edition,” 14 January 2014), https://xmlns.com/foaf/spec/. FOAF (Friend of a Friend) defines core classes including Agent, Person, Organization, Group, and Document, with Person modeled as a subclass of Agent. The foaf:knows property is a simple binary relation over foaf:Person and is deliberately underspecified — a design choice that has enabled wide adoption while limiting the strength of automated inference.
Timothy Lebo, Satya Sahoo, and Deborah McGuinness, eds., PROV-O: The PROV Ontology, W3C Recommendation, 30 April 2013, https://www.w3.org/TR/prov-o/. PROV-O models provenance through three core classes — Entity, Activity, and Agent — connected by relations including wasGeneratedBy, used, wasAssociatedWith, wasAttributedTo, and actedOnBehalfOf. The PROV working group deliberately minimized class disjointness (Requirement VI4 permits agents to also be activities or entities), prioritizing interoperability across heterogeneous provenance systems over strong logical separation
EDM Council and Object Management Group, Financial Industry Business Ontology (FIBO), initial release 2014, https://spec.edmcouncil.org/fibo/ and https://github.com/edmcouncil/fibo. FIBO emerged from data governance work prompted by the 2008 financial crisis, was first published in 2014, and is jointly stewarded by the Enterprise Data Management Council and OMG; it supports regulatory reporting, securities master data management, and risk analysis across the financial industry
Aidan Hogan et al., “Knowledge Graphs,” ACM Computing Surveys 54, no. 4, Article 71 (July 2021): 1–37.
Jeffrey Pfeffer and Robert I. Sutton, The Knowing-Doing Gap: How Smart Companies Turn Knowledge into Action (Boston: Harvard Business School Press, 1999). Pfeffer and Sutton’s central diagnosis is that the knowing-doing gap is most often a problem of organizational language, framing, and shared understanding, not of information access or individual capability.
George W. Furnas, Thomas K. Landauer, Louis M. Gomez, and Susan T. Dumais, “The Vocabulary Problem in Human-System Communication,” Communications of the ACM 30, no. 11 (November 1987): 964–971. The 0.07–0.18 range is from the paper’s Table I across five domains; the “myth” line and the 80-to-90-percent failure-rate finding are at p. 966–967. The “unlimited aliasing” remedy is at p. 968.
European Commission, New European Interoperability Framework: Promoting Seamless Services and Data Flows for European Public Administrations, COM(2017)134, adopted 23 March 2017, §3 (“Interoperability Layers”). The four-layer model — legal, organizational, semantic, technical — and the “what is sent is what is understood” formulation of semantic interoperability are at §3.5.
Mark D. Wilkinson et al., “The FAIR Guiding Principles for Scientific Data Management and Stewardship,” Scientific Data 3 (2016): 160018. The interoperability principles I1–I3 specify formal, accessible, shared, and broadly applicable knowledge representation; the human-versus-machine framing of meaning barriers is in the paper’s introduction.
Michael Polanyi, The Tacit Dimension (Chicago: University of Chicago Press, 1966), p. 4 (“we can know more than we can tell”).
Harry Collins, Tacit and Explicit Knowledge (Chicago: University of Chicago Press, 2010). The “strong tacit knowledge” framing and the discussion of collective tacit knowledge as the form that resists machine substitution recur throughout the book.
OAEI 2024 Interactive Track, op. cit. The Anatomy track has hovered near a 94.6% F-measure ceiling for several years; the MultiFarm cross-lingual track shows F-scores collapsing into the 0.04–0.42 range, indicating that automated matching across languages remains unsolved without human input.
For LLM-based ontology matching and its dependence on label quality, see Daniel Hertling and Heiko Paulheim, “OLaLa: Ontology Matching with Large Language Models,” Proceedings of K-CAP 2023; and Hamed Babaei Giglou et al., “LLMs4OM: Matching Ontologies with Large Language Models,” arXiv:2404.10317 (2024).
Ludwig Wittgenstein, Philosophical Investigations, trans. G. E. M. Anscombe (Oxford: Blackwell, 1953), §43.
about me. I’m a Semantic Engineer, Information Architect, and knowledge infrastructure strategist dedicated to building information systems. With more than 25 years of experience in enterprise architecture, e-commerce content systems, digital libraries, and knowledge management, I specialize in transforming fragmented information into coherent, machine-readable knowledge systems.
I am the founder of the Ontology Pipeline™, a structured framework for building semantic knowledge infrastructures from first principles. The Ontology Pipeline™ emphasizes progressive context-building: moving from controlled vocabularies to taxonomies, thesauri, ontologies, and ultimately fully realized knowledge graphs.
Professionally, I have led semantic architecture initiatives at organizations including Adobe, where I architected an RDF-based knowledge graph to support Adobe’s Digital Experience ecosystem, and Amazon, where I worked in information architecture and taxonomy. I am also the founder of Contextually LLC, providing consulting and coaching services in ontology modelling, NLP integration, knowledge graphs and knowledge infrastructure design.
I am also a curriculum designer, teacher and founder of The Knowledge Graph Academy, a cohort-based educational program designed to train and up skill future semantic engineers and ontologists. The Academy is the the perfect balance of ontology and knowledge graph theory and practice, preparing graduates to confidently work as ontologist and semantic engineers.
An educator and thought leader, I publish regularly on my Substack newsletter, Intentional Arrangement, where my writing frequently explores the relationship between semantic systems and AI.
More from Intentional Arrangement
Controlled Vocabularies, Part I
Intentional Arrangement is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. I appreciate your support ❤️
Controlled Vocabularies, Part II
Every organization speaks its own language but many organizations do not have controlled vocabularies. Language within organizations is how we communicate with each other and machines. But often agreeing upon language is embedded in company culture, a strange and wild landscape cluttered with the creative lingo of marketing and branding, the complex nuances of technical terms, business concepts, acronyms, and domain-specific jargon, that evolves organically over time.










...but is there also a danger that the humans become part of the Machine? :) Thinking of Bradbury here haha.
ZH
Jessica, this is such an exciting piece for me. I struggle so hard from my side of the fence to articulate this—I know it exists but because I don’t work in the computational side of linguistics, I don’t have all the reference points.
I’m focusing on Frame Theory at the moment and you reminded me of Filmore’s FrameNet which in of itself is fascinating as a global database.
As someone who’s needed to read this from someone else, thank you 🙏🏽