Back to the index
Alago R&D 024 22 APR 2026 DE
◆ FEATURED Ontology № 024 · v0.4

A working vocabulary for the Pre-Construction office.

Eighteen months of mapping the implicit ontology of the Projektsteuerer's protocols. Here is v0.4 — and the dozen edge cases that broke v0.3.

AuthorVinzenz Trimborn
Published22 Apr 2026
Read12 min
TopicOntology · KG

There is a habit, when you have read enough Jour-fixe-Protokolle, of reaching for a pencil. Not to write — to underline. The same words keep returning: Prüfgegenstand, Frist, Aufgabe, Gewerk, Projektsteuerer. They look like ordinary German nouns, but they are not. In a Pre-Construction office they are roles in a small, dense grammar — closer to the protocol of an old guild than to anything in a dictionary. The job of an ontology is to take that grammar seriously: to give the words formal types, fix the relations between them, and refuse the polite fiction that a paragraph of prose can stand in for a model.1

This note is the fourth iteration of that work. v0.1 was an afternoon's whiteboard sketch in late 2024. v0.3 — released internally last December — survived eight months of contact with real reports before three particular edge cases told us, plainly, that we had drawn the wrong boundary. v0.4 redraws it.

What we are modelling

The corpus is, on the surface, deceptively boring: the day-to-day work product of a Projektsteuerer's office — Jour-fixe-Protokolle, Vergabevermerke, Terminpläne, Pflichtenhefte, Bemusterungsprotokolle, and the Bauprotokolle that come back from site. About forty thousand documents, each between two and twenty pages, mostly written in a flattened bureaucratic German that has more in common with legal pleadings than with architectural prose.

What makes them interesting is not their language but their structure of obligation. Almost every paragraph either creates, transfers, or discharges a duty. An item under review (Prüfgegenstand) appears, gets assigned to a trade (Gewerk), gains a deadline (Frist), and is later either ticked off or escalated. Tracking the flow of those obligations across documents — from a Pre-Con Jour-fixe through to the Bauprotokoll and back — is the entire game.

Project Document Mangel Aufgabe Frist Gewerk Agent contains recorded_in raised_by spawns due assigned_to acts_on FIG. SCHEMA / V0.4 7 NODES · 7 EDGES
FIG. 01 The seven-node core of v0.4. Edges are typed and (with one exception, Mangel → Document) directed. Subtypes of Aufgabe — Behebung, Prüfung, Abnahme — collapse into a single node here for readability. ▶ See a Bauprotokoll line parse into this schema, live.

What broke in v0.3

v0.3 was a fifteen-class taxonomy with a clean inheritance tree. It was our pride for two months and our enemy for the next six. Three things broke it.

1. The fiction of a single deadline. A Frist in the wild is rarely a date. It is a small object with three plausible interpretations: the calendar deadline, the contractually-binding deadline, and the deadline the Projektsteuerer actually means when she writes "bis nächste Woche" in the Jour-fixe minutes. v0.3 collapsed all three into a datetime field. v0.4 promotes Frist to a first-class entity with its own provenance.2

2. Composite trades. A Gewerk is not a leaf. Almost every fourth defect involves two trades blaming each other across an interface — the classic example being TGA-Planer and Rohbau arguing about a duct penetration. v0.3 forced one assignment per defect. v0.4 lets a Mangel attach to an arbitrary set of Gewerks, with a typed responsibility edge that distinguishes verursacht, beteiligt, and betroffen.

3. Documents are not containers. v0.3 modelled a Jour-fixe-Protokoll as a bag of paragraphs with a date stamp. Useful, until you discover that ten percent of all open items in the corpus are first mentioned in an email or a Vergabevermerk and only later cross-referenced into a protocol. The document graph is a network, not a tree, and the entities float above it.

"Most ontology errors are not errors of taxonomy. They are errors of cardinality — of pretending one of something exists where in fact there are several, in flux, only loosely agreed." — Working note, v0.3 retrospective, Feb 2026

The v0.4 contract

The schema is intentionally small. Seven core types, fourteen edge labels, and a handful of literal-typed properties. The decision rule we landed on, after several false starts, is this: if a thing is ever the subject or object of a sentence in a Projektsteuerer's protocol, it gets a node; if it is only ever a modifier, it stays a property. That rule sounds obvious in writing. It is not — it took six months and three ontologists arguing in a Munich basement to find it.

What follows is the canonical rendering of the schema. We use a stripped-down Turtle-like notation; the production schema is in OWL with a SHACL validation layer.

:Project        a owl:Class ;
                rdfs:label "Bauprojekt" .
:Document       a owl:Class ;
                rdfs:subClassOf :Artefact .
:Mangel         a owl:Class ;
                rdfs:label "Defect / Beanstandung" .
:Aufgabe        a owl:Class ;
                rdfs:label "Task / Pflicht" ;
                owl:disjointWith :Mangel .
:Frist          a owl:Class ;
                rdfs:label "Deadline (typed)" .
:Gewerk         a owl:Class ;
                rdfs:label "Trade / Discipline" .
:Agent          a owl:Class ;
                rdfs:subClassOf foaf:Agent .

:contains       a owl:ObjectProperty ;
                rdfs:domain :Project ; rdfs:range :Document .
:recorded_in    a owl:ObjectProperty ;
                rdfs:domain :Mangel  ; rdfs:range :Document .
:assigned_to    a owl:ObjectProperty ;
                rdfs:domain :Aufgabe ; rdfs:range :Gewerk .
:due            a owl:ObjectProperty ;
                rdfs:domain :Aufgabe ; rdfs:range :Frist .

Two things to note. First, :Mangel and :Aufgabe are disjoint. A defect is not a task; a task is the duty to remedy one. Conflating the two — as most off-the-shelf "construction" ontologies do — is the single most expensive modelling error we have made.3 Second, :Frist is a class, not a literal. It carries provenance: who wrote the date, when, and against which calendar.

A small, real example

Consider one paragraph from a real Jour-fixe-Protokoll of a Projektsteuerer, lightly edited:

"BV Riemerschmidt, JF-14, 14.04.2026 — TOP 4.2: Im 3. OG ist die Brandschottung im Bereich der Lüftungstrasse Achse C/4 in der Ausführungsplanung nicht eindeutig festgelegt. Klärung durch TGA-Planer in Abstimmung mit Rohbau, Vorlage bis KW 18."

v0.3 would render this as a single Mangel with a string-typed deadline = "bis KW 18" and an assignee = "TGA-Planer". v0.4 renders it as a small subgraph: one Prüfgegenstand node ("Brandschottung Achse C/4"), recorded in the Jour-fixe-Protokoll JF-14 of 14 April 2026, attached to two Gewerks with different responsibility edges (verursacht: TGA-Planer, beteiligt: Rohbau), spawning a single Aufgabe ("Klärung Brandschottung, Vorlage Detail"), with a Frist entity carrying both the calendar interpretation (KW 18 of 2026, ending 03.05.2026) and the textual original ("bis KW 18").

This is more nodes than v0.3. It is also, by every retrieval metric we care about, dramatically more useful.4

What we did not solve

Three known unknowns, in order of how much they keep us up at night:

v0.5 is already on the whiteboard. It will be smaller, again, in the parts that matter, and larger in the parts we keep hoping not to need.

— V. T., Munich, 22 April 2026.

Notes

  1. The classical reference here is Gruber's "A translation approach to portable ontologies" (1993), but the closer and less-cited model is Smith & Welty's work on practical ontology engineering — particularly the warning that taxonomies trained on documents tend to recapitulate the document's politeness rather than its content.
  2. This is the bug that prompted the rewrite. Given a corpus query "all defects with deadlines in KW 18", v0.3 returned five hits; v0.4 returned thirty-eight. The other thirty-three were not missing — they were misfiled under string fragments like "Mitte Mai" and "vor Pfingsten".
  3. We inherited the conflation from one of the larger BIM ontologies (which we will not name) and spent four months unwinding it.
  4. Multi-hop retrieval F1 climbs from 0.41 to 0.69 on our internal benchmark. We will publish the benchmark and the eval harness in a separate note.
V
Author · Vinzenz Trimborn

Co-founder of Alago. Writes about ontologies, retrieval, and the messy business of modelling Pre-Construction work.

Continue reading

№ 023 · Method · 9 min Graph-RAG vs. vector retrieval on technical specs Next → № 019 · Field note · 8 min Time on a construction site is not a number Related → ▶ Anim. 01 · 10 s loop · Silent Watch a sentence become a graph, live. Animation →