This article is part II of a three part series. Part I can be found here.
Foundation for Shared Understanding
The Simple Knowledge Organization System ontology (SKOS) supports the thoughtful curation of concepts and structuring of primary relations between concepts. This high-level modeling is done by defining what we call candidate concepts. Candidate concepts account for all concepts to be considered for promotion, be it a preferred label, alternative label or hidden label. The chosen concepts are first modeled as a controlled vocabulary or flat(ter) list.
The curation stage and the refined, resulting concepts are then structured as a taxonomy or hierarchy. Some practitioners choose to simultaneously model a taxonomy and thesaurus when working with SKOS. Like any other practice, there are benefits and pitfalls to varying orders of operation. My recommendation is to model each stage individually, as SKOS modeling is iterative, anyways. You will continue to revise and refine your SKOS model throughout the lifecycle of the thesaurus. So get used to it.
Most candidate concepts can be acquired through a collection of all concepts within a system, such as what may be derived from a scrape of systems or collected from lists of metadata terms, data catalogs, disparate taxonomies or a conglomeration of several source vocabularies. The idea is to collect as many vocabularies and terms in use to establish a collection of terms that will represent a shared understanding of a system, domain or of specific subjects.
Not all candidate concepts become classes or official members of a SKOS concept collection. The idea is to collapse and refine candidate concepts to arrive at a reconciled, disambiguated SKOS vocabulary, where all concepts are defined and unique.
The work of refining a vocabulary takes rigor and discipline, as it’s easy to turn a blind eye to ambiguous terms, or succumb to stakeholder pressure when colleagues insist on certain concepts such as “Other implementations”. The key here is to resist peer pressure and work against ambiguity, for the sake of knowledge. For a more extended overview of SKOS and ambiguity, I suggest you bookmark and read ANSI/NISO Z39.19-2005 (R2010) Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies.
What’s With a Shared Vocabulary?
The process of modeling system concepts using SKOS is ultimately focused upon establishing a shared vocabulary, bolstered by SKOS’s ontological logic. A SKOS vocabulary reduces ambiguity by guiding the modeler through the thoughtful curation of concepts, establishing natural workflows to support reconciliation of duplicate or near duplicate concepts.
For example, when I was handed a flat list of over 10,000 terms, the result of GitHub and Stack Overflow web scraping, SKOS provided the framework to reconcile synonyms and acronyms, while shaping the long list into a three-level SKOS taxonomy. 10,000 terms swiftly became 2,500 well defined concepts, with parent-child relations alternative labels, hidden labels and related concepts. SKOS modeling is a critical foundation for the development of more complex ontologies, as it is impossible to model messy, undefined, unrefined data. A shared vocabulary establishes baseline definitions, agreed upon vocabularies and therefore, a shared understanding of what is to be further moulded and modeled.
Support Iterative Development
Lightweight ontologies are agile: they can be quickly prototyped, tested, and adapted as understanding evolves. This scaffolding approach helps teams learn together, gradually maturing their knowledge model before locking in heavy constraints that accompany lower, more complex ontologies. SKOS is a fabulous ontology model for getting the ontology feet wet, a welcome primer for ontology modeling. With all of the head scratching around how to build ontologies and even, what is an ontology, SKOS helps to teach ontology logic through the ontological structuring taxonomies and thesauri.
Just as machines appreciate the logical discipline introduced by relationship constraints and definitions, human modelers benefit from the limitations presented by SKOS’s inability to model more complex relationship types. This is the very essence of ontology modeling, as building ontologies requires focus and discipline. SKOS exists as ontology training wheels, a way to model a shared understanding by way of a simplified domain model.
A glossary, data catalog, taxonomy and thesaurus are all excellent candidates for SKOS modeling. Constrain the use cases, start with what exists within any given system, model with SKOS and iterate, until your data asset can stand on its own as a rich, disambiguated SKOS ontology model.
Experiment with your SKOS model, use it for Retrieval Augmented Generation (RAG), train your AI model on SKOS and measure to see how AI output improves. Operationalize SKOS early and often, to provide a feedback loop for iterative modeling, until the shared vocabulary model provides enough context, enough meaning, to deliver meaningful results.
Welcome to Collaboration
One of the biggest issues with ontologies is expertise and know-how. Building ontologies is not for the weak and is normally reserved for expert ontologists and well-trained semantic engineers. SKOS is the ideal entry-level ontology, perfectly positioned to support stakeholder engagement and collaboration.
Domain experts and non-technical stakeholders alike can more easily participate in SKOS-based modeling as the primary tasks associated with SKOS involve defining concepts and modeling defined concepts as parent–child relations. SKOS’s simplified logic lowers the barriers for collaboration and validation, ensuring the resulting ontology reflects real-world knowledge and needs. And the best feature of SKOS is its simplicity (it’s in the name, after all).
Why SKOS is Enough
SKOS provides a framework specifically designed to represent controlled vocabularies, classification schemes, and thesauri. At its core, SKOS simplifies complex concepts into meaningful yet easily manageable components. It enables basic semantic relations such as hierarchical (broader/narrower), associative (related), and equivalence (synonyms). For many organizations, these fundamental relationships capture all the contextual information needed to power effective AI solutions.
Take, for instance, a business implementing a recommendation engine, with or without AI. Initially, it may seem essential to construct a deeply expressive ontology. However, as explored in my essay, "From Metadata to Meaning: The Knowledge Connection", precise metadata is often the simplest and most direct way to impart the meaning and context of data. A SKOS thesaurus model efficiently meets this need by clearly defining concepts, categorizing entities, and articulating basic relations. A SKOS model can act as a sophisticated decision tree, a tree structured using an ontology and therefore, is a machine readable, interoperable model.
The same SKOS model used for a recommendation engine can, in turn, be used for concept discovery, with the help of a matching framework. This is where a vector database comes in handy. For new concepts entering a system, vectors can be calibrated to dispose of concepts that match existing SKOS concepts, and only map new terms to corresponding, matching SKOS categories. While this is a seemingly simple implementation, SKOS grounds vector databases with a curated and defined domain model, to establish “rules” for concept groupings and attribution. SKOS is a version of human-in-the loop for vectors.
Foundations for Complexity: The Mighty Thesaurus
While SKOS supports controlled vocabularies, and taxonomies, a SKOS thesaurus serves as an ideal foundational layer for more complex ontologies. A thesaurus not only organizes concepts, relationships, and metadata in an approachable and extendable format. It is often argued that a thesaurus comes before a taxonomy and many of these arguments are valid. A thesaurus can serve as an index, much like the index of a book, providing references to indexed concepts, where information can be found.
Much like a book index, we experience this indexing phenomena when searching in Google, when the search engine suggests “see also” or “did you mean”, when working to find the best results for a user query. When modeling in SKOS, we first declare the primary, hierarchical relations or parent-child relations before extending these relations to include the transverse relations that break out of the confines of a taxonomy model.
The “related” element in SKOS can be experienced in a book index where “see also” is enabled by the simple skos:related
property. In this example, herbs for sleep are related to teas, working outside of and extending beyond the logical taxonomic parent-child relation. A thesaurus extends the semantic richness of a taxonomy, preparing the ontology for more complex ontologies and relationship types, when ready.
But What About…
And what if the related “thing” is not in the same SKOS vocabulary? Well, there’s a SKOS solution for that scenario! The property skos: relatedMatch
is one of SKOS’s mapping properties, intended to link two SKOS concepts that live in different concept schemes via an associative (non-hierarchical) relationship.
By definition skos: relatedMatch
is a sub-property of both skos:related
and skos:mappingRelation
, so any triple using skos:relatedMatch
also asserts a general skos:related
link plus that it’s a mapping relation between schemes. That’s right: you can build and maintain multiple domain SKOS thesauri and extend relations to support a more complex SKOS model.
In practice you use skos:relatedMatch
when you want to make clear that two concepts—say one in your in-house taxonomy and one in an external vocabulary—are related, without implying a broader/narrower hierarchy. If you just need an associative link within a single scheme, you’d use skos:related
; if you need to show a cross-scheme mapping that happens to be associative, you use skos:relatedMatch
.
Because SKOS places no hard integrity constraints on mapping properties, you can even use skos:relatedMatch
between concepts in the same scheme (it will still imply mappingRelation), but the convention is to reserve it for inter-scheme links for clarity. I advise that you determine rules for when mapping relations are utilized, for conformance and consistency.
Conclusion
SKOS offers an elegant entry point into the world of semantic modeling, providing just the right balance of structure and simplicity to build a shared vocabulary that both humans and machines can understand. By iterating through candidate concepts, taxonomies, and thesauri, you cultivate a disciplined approach to disambiguation and definition that pays dividends as your domain knowledge evolves. Its lightweight logic empowers stakeholders—from subject matter experts to semantic engineers—to collaborate on meaningful hierarchies and thesauri, without the risks associated with getting lost in the complexity of lower ontologies. And when it’s time to graduate to richer ontologies, your SKOS foundation ensures that you’re building atop a rigorously curated, interoperable framework. Embrace SKOS early, experiment often, and let its “training wheels” guide you toward truly powerful, dynamic and scalable knowledge models.
The final installment in this series will dive into the nuts and bolts of SKOS and SKOS–XL. Stay tuned for the finale!
For now, SKOS in action:
NIST Material Data Vocabulary in SKOS