Before the Structure

The Principles of Organizing as First Principles for AI

Jun 01, 2026

This essay draws upon two influential academics and mentors, Robert Glushko and Elaine Svenonius.

Robert J. Glushko is an information scientist and professor at the UC Berkeley School of Information who earned his PhD in cognitive science under David Rumelhart, and who helped pioneer the use of XML for electronic business through Veo Systems before turning to the study of how resources are organized. He is the author of Document Engineering (2005) and the editor of The Discipline of Organizing (first published 2013), the synthesizing work that gives the field its cross-domain vocabulary for organizing systems.

Elaine Svenonius is an American librarian and library scholar, trained in philosophy and library science at the University of Pennsylvania and the University of Chicago, known for bringing a philosophical knowledge-organization approach to the theory of bibliographic control — cataloging, classification, and indexing. Her book The Intellectual Foundation of Information Organization (MIT Press, 2000) synthesizes subject and descriptive cataloging within a common conceptual framework and is so widely assigned in library and information science programs that students nicknamed it "the red devil."

Between these two scholars, there is synthesis, whereby the very foundations of organization for digital systems emerge, timeless and true still, today. Glushko and Svenonius have made a great impact upon how I approach semantic systems and knowledge infrastructures. So in kind, I am sharing these learnings with you.

Every organization building toward AI increasingly wants a knowledge graph, a semantic layer, a governed catalog and something that lets machines reason over the company’s resources. Most of them start by buying that solution. The platform arrives, the connectors get wired, and then the question invariably emerges, organized according to what?

The discipline of organizing is centuries old, transcending physical and digital spaces, core the information and knowledge systems. Organizing is a set of decisions made before any structure exists, and the quality of those decisions determines whether a structure can work as designed. Organizing lays out decisions, first principles from which data and information can be prepared for structure and to thereafter communicate meaning.

What is an Organizing System

Robert Glushko’s The Discipline of Organizing gives the field its working definition. An organizing system is a collection of resources arranged through intentional arrangement, supporting a set of interactions.1 Three things have to be true. There is a collection, where there are resources selected for a purpose rather than accumulated by default. There is intentional arrangement, whereby the arrangement was placed by an agent following a principle. There are interactions, from which the system exists to let someone or something act on the resources.

The qualifier intentional draws the boundary that drives how decisions are made and ultimately, how to identify and manage resources. The strata of the Grand Canyon contain information while the three aligned stars of Orion’s belt form a pattern of undeniable regularity. Glushko excludes both from the category, because each arrangement was produced by geological force or by the accident of where Earth sits, rather than by an agent following an organizing principle.2 The geologist who decides what to measure across those strata, how to combine the measurements, and which theory to test is an organizing system composed of the same rock. The raw shape and form of the canyon is left unorganized, for the purposes of measuring across strata.

Colorful striped mountains under a clear blue sky. — Photo by Shai Pal on Unsplash

The AI economy does not seem to understand the principles of organizing, choosing to prioritize the automation of organizing in order to enable automation. And the absence of organizing principles is cultural, a pre-existing condition establishing by technology and big data.

Through the 2000s, the more data a company held the more successful it was assumed to be, locked inside relational systems as tables and rows whose dependencies made sense only inside that one ecosystem, if at all.3 The volume was treated as the asset. No one questioned the value of what was stored, on the assumption that it was a private matter unrelated to any principle of organizing. A data lake filled by indiscriminate ingestion is the Grand Canyon, not the geologist’s dataset. There is information in it and no organizing system over it, and a model trained on it inherits the absence of intent as an absence of meaning.

The Resource Has to Be Identifiable First

Truth is, before anything can be organized it has to be identified. Elaine Svenonius notes this dependency, stating that a resource that is hard to identify is hard to describe and therefore hard to organize.4 A resource carries identity, so it can be named and distinguished. It carries description, so information about it can be recorded and attached. It persists over time, so it can version, decay or be superseded without losing its thread. Strip any one of these and the resource is noise - with maybe a label.

The same physical thing can be a different resource in a digital form, depending on intent. Glushko’s example is a carved chess piece in a museum, which can be a separately identified item, as a member of a set, or one of thirty-three unidentified components of an object catalogued as a chess set with its board.5

brown and white chess board game — Photo by Jeet Dhanoa on Unsplash

A merchant’s SKU might point to a unique item, to a class of items treated as equivalent for billing, or to something intangible like a warranty. Deciding what counts as the unit is the first organizing decision, and it is conceptual before it is technical. When two systems try to merge customer records and one stores a single NAME field while the other splits TITLE, FIRSTNAME, and LASTNAME into separate fields, the integration fails because the two systems answered “what is a resource” differently and the answers were never reconciled.6

This is the level at which most enterprise data has already failed , before LLMs are involved, before any AI touches it. A customer who exists as four records across four systems exists as four resources. A field whose values mean whatever the person typing them intended, has no stable description. A table that overwrites yesterday with today has no persistence over time. The atom is broken, and nothing assembled from broken atoms holds together.

The Six Decisions That Come Before Structure

The work that has to happen before you build an organizing structure is a set of design decisions, which Glushko frames as six questions.7 These questions are posed prior to technology. A platform purchased before these questions are answered only hard codes whatever answers its defaults encode.

What is being organized. The unit of organization, decided deliberately.: items, classes, or components. Resources alone, or surrogates and identifiers that let one resource belong to many collections. Born-digital resources, legacy scans or both at once.8

Why it is being organized. The purpose, which governs every other choice. Svenonius reduces the central purpose to bringing like things together and telling them apart, and the priority among purposes still varies.9 A memory institution organizes to preserve. A business system organizes to transact. A current-awareness feed organizes to surface a few relevant items fast. A scholar’s retrieval system organizes to find every relevant document in a domain.10 The same resources demand different organization, depending on which purpose dominates. A system that does not know its own purpose cannot be evaluated against one.

How much it is being organized. The degree of organization, which Glushko measures by how many organizing principles are at work.11 A closet arranged by season and body part is barely organized. An online music store arranged by genre, artist, album, release date, and popularity is heavily organized, because more principles operate over it. Svenonius notes that not all resources deserve the same degree of organization.12 An inherent tradeoff runs between effort spent organizing and effort spent retrieving; at scale the organizing effort earns its keep across every future interaction, and at small scale it may not pay back at all.13

When it is being organized. On the way in, or later on demand. Organization imposed at creation, or arrangement deferred until an interaction requires it. The choice shapes what is possible downstream, because description not captured at the moment of creation is often unrecoverable.14

How, and by whom, it is being organized. By professional catalogers using a controlled vocabulary to enforce what descriptions mean, by algorithms weighting terms by frequency and distribution, or by the crowd leaving traces.15 Automated description is more consistent than human description and can seem more authoritative than it is; a detailed machine-generated description is not automatically a more useful one.16

Who does the work also decides who bears its cost and who reaps its benefit, which is why stakeholders fight over the degree of organization. Physicians want broad classifications and narrative notes; insurers and researchers want fine-grained form-filling that makes the physician’s work heavier.17 The disagreement is about labor and value.

Where it is being organized. The physical or logical environment and its constraints. A single physical resource sits in one place at a time; a digital surrogate can occupy many places at once - and support searching and sorting at a scale tangible things cannot reach.18

An organizing system is defined by the composite of these answers. They interrelate so that they resolve together rather than in sequence, and the system that emerges is the integrated shape of the choices.19 Skip them and you have let the tool make the decisions, as a black box does.

The Principles Live Above the Implementation

An organizing principle is a directive for arranging a collection, expressed without assuming any particular implementation.20 Alphabetical order is a principle. Chronological order is a principle. Grouping resources used together, keeping frequently used resources close, protecting rare ones — these are principles thousands of years old that govern a kitchen and a knowledge graph by the same logic.21

Shelves of various spice jars are displayed. — Photo by Zoshua Colah on Unsplash

The principle is logically separate from how it is realized, and Glushko maps this onto the three-tier architecture every software designer knows: storage, logic, presentation.22 In the discipline of organizing, the middle tier is the intentional arrangement, the storage tier holds the resources, and the presentation tier supports the interactions. A subject-organized digital library keeps its organizing principle when its files move from a local disk to a network store, because the principle lives in the middle tier and the storage is below it.23

Teams that miss this distinction pour their effort into the presentation tier. Glushko's diagnosis is that people color-code the folders before deciding what the folders are for. The data—work equivalent is constant observed as teams that stand up the catalog, the governance suite or the graph database before anyone has articulated the organizing principle the platform is meant to express. The platform is only the storage tier, and the principle that should govern it was never defined.

Color coding and labeling make a bad logical organization easier to see, not more correct. A graph built this way — over unarticulated semantics and an undefined principle — is an implementation tier with nothing above it to implement.

A row of different colored laptops sitting next to each other — Photo by Aleksi Partanen on Unsplash

The Layers Are Principles

A knowledge graph cannot be the starting point, because it sits at the top of a stack of organizing systems, each one a principle rendered more explicit and more machine-actionable than the last. Library and information science has been developing principles and solutions for organizing for centuries, well before AI needed to reach for information organization, semantics and knowledge infrastructures.

Gail Hodge’s CLIR survey of knowledge organization systems lays the stack out by the increasing complexities of structure.24 Term lists come first with authority files that control the variant names of an entity, glossaries, gazetteers. These enforce that a thing has one preferred name and that variants resolve to it.25 Then we have classifications and categories, consisting of subject headings, taxonomies, and classification schemes that sort resources into groups.26 The next level of maturity reaches for relationship lists, where thesauri encode broader, narrower and related terms. Finally, there’s semantic networks, which link concepts as a web, and ontologies, which model complex relationships that carry the rules and axioms the looser schemes leave out.

graphical user interface, application, shape, arrow — Photo by Growtika on Unsplash

Every one of these organizational structures impose a particular view of the world on the collection whereby same entity can be characterized differently depending on which system is used. No single universal scheme exists or ever will, because what is meaningful to one culture or domain is not meaningful to another.27

This is reflected in the Ontology Pipeline™. Controlled vocabulary, metadata schemas, taxonomy, thesaurus, ontology and finally, knowledge graph. Each layer depends on the one beneath it. A taxonomy with no controlled vocabulary underneath is sorting terms whose meanings are not fixed. An ontology with no taxonomy beneath it is asserting relationships among categories that were never cleanly drawn. A knowledge graph is the implementation of a semantics that has to exist before the graph can express it. Built iteratively, we define and describe the entities and resources we intended to organize before organizing. This is foundational to any organizing system.

Most Knowledge Is Not Written Down

A further layer of prior work sits beneath the structure, because the raw material does not exist in usable form until someone makes it exist. The knowledge an organization most depends on is tacit, held in the experience and judgment of its people rather than codified in any document or database.28

brown wooden blocks on white surface — Photo by Brett Jordan on Unsplash

Knowledge management research has shown for decades that the existence of knowledge, even shared knowledge, does not improve performance unless it is captured and applied, and that organizations consistently rate this work as important while implementing it poorly and inconsistently. A recent study of an international IT firm catalogued dozens of distinct obstacles to knowledge management and found the deepest one to be the gap between how much the work is valued and how little it is actually done.

Codifying tacit knowledge into explicit, retrievable forms is itself an organizing act, and it has to happen before any structure can be applied. When that work is skipped, the structure stores only what was already written down, the shallow remainder, while the deep knowledge walks out the door with the person that held that knowledge, in context. A graph cannot recover knowledge that was never externalized or recorded. Only then can the knowledge be described, defined and encoded, according to principles for organizing, after which structure can be applied.

Building With Intention

The order of operations is the whole argument. The resource has to be identifiable before it can be described and has to be described before it can be classified. It has to be classified before relationships amongst classes can be asserted. The relationships have to be asserted before a graph can express them. Underneath all of it, the six design decisions have to be answered deliberately, because they define what the organizing system is.

The AI economy inverted this order. It bought the implementation tier and waited for the semantics to rise out of sheer volume—that the machines would figure out meaning. It treated accumulation as collection, syntax as structure, and a purchased platform as an authored principle. When the models then fail to find structure autonomously, the failure is often attributed to the LLMs. In reality, the failure is a result of organizing, or lack of organization principles foundational for semantic systems and knowledge infrastructures.

Organizing is the discipline that turns terrain into a collection and a collection into a system something else can build upon. The structure everyone wants to build or dreams of manifesting is buildable, and exists as a discipline. But first comes the foundation, the principles for organizing.

Get 25% off a group subscription

Footnotes

Robert J. Glushko, ed., The Discipline of Organizing: 4th Professional Edition (Berkeley, CA: Pressbooks, 2022), ch. 2, “The ‘Organizing System’ Concept.” Licensed CC BY-NC 4.0.

Glushko, Discipline of Organizing, ch. 5, “The Concept of ‘Intentional Arrangement.’” The Grand Canyon strata and Orion’s belt are Glushko’s examples of patterns excluded from the category because they arise from deterministic natural forces or a perceptual vantage point rather than from an agent following an organizing principle.

Jessica Talisman, “The Data Hyperbole and AI” (companion essay). The characterization of 2000s big-data practice — volume treated as the asset, value of stored data left unquestioned, dependencies legible only inside a single relational ecosystem — is developed there.

Elaine Svenonius, The Intellectual Foundation of Information Organization (Cambridge, MA: MIT Press, 2000), 13, as cited in Glushko, Discipline of Organizing, ch. 12.

Glushko, Discipline of Organizing, ch. 12, “What Is Being Organized?” The chess-piece and SKU examples are Glushko’s illustrations of how the unit of organization depends on intent rather than on the physical thing.

Glushko, Discipline of Organizing, ch. 14, “How Much Is It Being Organized?” The NAME versus TITLE/FIRSTNAME/LASTNAME integration failure is Glushko’s example of granularity mismatch between systems.

Glushko, Discipline of Organizing, Part II, “Design Decisions in Organizing Systems,” chs. 12–17.

Glushko, Discipline of Organizing, ch. 12. On surrogates, identifiers, and a resource belonging to more than one collection, see also ch. 4, “The Concept of ‘Collection.’”

Svenonius, Intellectual Foundation, xi, as cited in Glushko, Discipline of Organizing, ch. 13, “Why Is It Being Organized?”

Glushko, Discipline of Organizing, ch. 13, contrasting memory institutions, business systems, current-awareness feeds, and comprehensive scholarly retrieval as different dominant purposes.

Glushko, Discipline of Organizing, ch. 14. The reframing of “how much” as “how many organizing principles are at work,” along with the closet and music-store examples, is Glushko’s.

Svenonius, Intellectual Foundation, 24, as cited in Glushko, Discipline of Organizing, ch. 14.

Glushko, Discipline of Organizing, ch. 12, on the organizing-versus-retrieval tradeoff and how it changes with scale and with whether organization and retrieval are performed by the same people.

Glushko, Discipline of Organizing, ch. 15, “When Is It Being Organized?”

Glushko, Discipline of Organizing, ch. 16, “How (or by Whom) Is It Organized?”; on controlled vocabularies enforcing consistent meaning, ch. 14.

Glushko, Discipline of Organizing, ch. 14, on the potential downside of automated resource description appearing more authoritative than a simpler human description.

Glushko, Discipline of Organizing, ch. 14; the physician-versus-insurer example draws on Jonathan Grudin’s work on non-technological barriers to collaboration technology (Grudin 1994), as cited there.

Glushko, Discipline of Organizing, ch. 12, on the difference between physical resources and digital surrogates; ch. 17, “Where Is It Being Organized?”

Glushko, Discipline of Organizing, ch. 5, on the composite of design decisions defining an organizing system, and ch. 18, “Key Points.”

Glushko, Discipline of Organizing, ch. 6, “The Concept of ‘Organizing Principle.’” The design heuristics — group resources used together, keep frequently used resources accessible, protect rare ones — and the long history of such principles are Glushko’s.

Glushko, Discipline of Organizing, ch. 6, sidebar “The Three Tiers of Organizing Systems,” mapping storage, business logic, and presentation onto resources, intentional arrangement, and interactions.

Glushko, Discipline of Organizing, ch. 6, on the logical separation of organizing principle from implementation, illustrated by digital-library storage.

Glushko, Discipline of Organizing, ch. 6. The observation that people waste effort color-coding folders and labeling containers before designing the logical organization is Glushko’s; the application to data-platform procurement is mine.

Gail M. Hodge, Systems of Knowledge Organization for Digital Libraries: Beyond Traditional Authority Files(Washington, DC: Council on Library and Information Resources, 2000), ch. 1, “Knowledge Organization Systems: An Overview.” The grouping of KOS types into term lists, classifications and categories, and relationship lists is Hodge’s.

Hodge, Systems of Knowledge Organization, ch. 1, “Term Lists” (authority files, glossaries, dictionaries, gazetteers).

Hodge, Systems of Knowledge Organization, ch. 1, “Classifications and Categories” (subject headings; classification schemes, taxonomies, and categorization schemes).

Hodge, Systems of Knowledge Organization, ch. 1, “Common Characteristics of Knowledge Organization Systems,” and on cultural constraint and the unlikelihood of a single universal scheme, citing Michael Lesk (1997).

On the tacit/explicit distinction in organizational knowledge, see Kalle Koivisto and Toni Taipalus, “Pitfalls in Effective Knowledge Management: Insights from an International Information Technology Organization,” International Journal of Knowledge Management Studies 16, no. 1 (2025), DOI: 10.1504/IJKMS.2025.146083, §2.1–2.2, and the codification literature reviewed there.

about me. I’m a Semantic Engineer, Information Architect, and knowledge infrastructure strategist dedicated to building information systems. With more than 25 years of experience in enterprise architecture, e-commerce content systems, digital libraries, and knowledge management, I specialize in transforming fragmented information into coherent, machine-readable knowledge systems.

I am the founder of the Ontology Pipeline™, a structured framework for building semantic knowledge infrastructures from first principles. The Ontology Pipeline™ emphasizes progressive context-building: moving from controlled vocabularies to taxonomies, thesauri, ontologies, and ultimately fully realized knowledge graphs.

Professionally, I have led semantic architecture initiatives at organizations including Adobe, where I architected an RDF-based knowledge graph to support Adobe’s Digital Experience ecosystem, and Amazon, where I worked in information architecture and taxonomy. I am also the founder of Contextually LLC, providing consulting and coaching services in ontology modelling, NLP integration, knowledge graphs and knowledge infrastructure design.

I am also a curriculum designer, teacher and founder of The Knowledge Graph Academy, a cohort-based educational program designed to train and up skill future semantic engineers and ontologists. The Academy is the the perfect balance of ontology and knowledge graph theory and practice, preparing graduates to confidently work as ontologist and semantic engineers.

An educator and thought leader, I publish regularly on my Substack newsletter, Intentional Arrangement, where my writing frequently explores the relationship between semantic systems and AI.

Connect with me on LinkedIn-!

Marc-Henri Hurt

15h

Hello Jessica,

I approve of the approach you describe with the necessary rigor and I support it all the more in the context of the development of AI that you rightly criticize.

But since the concept of « organization » is at the center of it, I cannot fail to think of my last post on LinkedIn in tribute to Edgar Morin, who, it seems to me, has also placed « organization » at the heart of his book « The Nature of Nature », of which I had tried to articulate some concepts. I cannot decently engage in a discussion or an explanation of a book that I read too long ago, so my remarks are approximate.

But it seems to me that his concept of « organization » also covered, and perhaps especially « The raw shape and form of the canyon », elements without conscious « intent », but possibly endowed with a purpose.

What seems especially interesting to me is that both the works you cite and those of Edgar Morin, and more broadly the systemic, have greatly inspired information systems designers, therefore with perhaps divergent principles, until giving birth in France to a design method called "Merise", before the triumph of UML, but which was perhaps not systemic enough.

In case I would be very inspired by systemic principles, it would be a kind of reversal from the perception I had when I listened to US consultants at the beginning of my career, who seemed to me more creative and innovative than the French, but also gave less importance to what made the French proud: « know how to make a plan », which you would then sustain better today :).

Jacek Tomaszczyk

This strongly resonates with my own work on lightweight semantic layers for AI in document-intensive organizations.

Many organizations probably do not need a full ontology or knowledge graph at the beginning. What they often lack is a prior interpretive layer: shared definitions, stabilized terminology, explicit distinctions between local and general meanings, and a few operational relations that make documents interpretable in context.

Before knowledge becomes machine-actionable, it has to become organizationally articulate. Only then can it become graphable.

1 reply by Jessica Talisman, MLS

6 more comments...

Discussion about this post

Ready for more?