The Fragmentation Problem

Why discourse parsing needed a unified framework

Rhetorical Structure Theory
The Tree Constraint
Alternative Frameworks
Toward Synthesis

Consider:

Max fell. John pushed him.

vs.:

John pushed Max. He fell.

Both formulations convey the same causal and temporal information: the pushing caused the falling, and the pushing preceded the falling. Yet the first presents events in counter-chronological order while the second follows chronological sequence. Neither sentence explicitly states causation or temporal ordering; these meanings arise from how we interpret the discourse structure connecting the sentences.

This example, from Asher and Lascarides's Logics of Conversation, grounds the entire enterprise of [discourse parsing][discp]. Documents exhibit organisation beyond individual sentences (relations like CAUSE, CONCESSION, ELABORATION, SEQUENCE) whose recovery is essential for a host of tasks ranging from summarisation to dialogue planning.

Rhetorical Structure Theory

Mann and Thompson's Rhetorical Structure Theory, proposed in 1988, was the first computationally implemented framework for discourse parsing. RST represents documents as hierarchical trees covering entire texts down to Elementary Discourse Units (EDUs), the minimal propositions that form the tree's leaves. Each node in the tree carries a relation label (CAUSE, CONTRAST, ELABORATION, etc.) and a nuclearity classification: nuclei are the more prominent units, satellites the less essential ones.

This tree structure proved immediately useful. Because satellites elaborate or support nuclei, removing satellites and their descendants yields extractive summaries at any granularity. The recursive branching tracks how topics evolve and bifurcate in extended discourse. Early research on automatic summarization and dialogue planning built directly on these properties.

The RST Discourse Treebank (RST-DT), annotating Wall Street Journal text, demonstrated the framework's viability at scale: over 200,000 tokens across 385 documents, with one of the largest relation inventories ever implemented (78 types). RST remains the most widely used framework, covering the most languages and datasets in recent cross-formalism benchmarks.

The Tree Constraint

RST's defining assumption is that discourse structure forms a tree. Every unit participates in exactly one relation at each level of the hierarchy; there are no crossing branches, no cycles, no concurrent connections. This constraint is both RST's greatest strength and its most significant limitation.

Consider a fragment from the RST-DT corpus where an EDU contains two discourse markers: "then" signaling temporal sequence and "but" signaling concession. The tree constraint forces a choice. One relation gets annotated (TEMPORAL-AFTER); the other disappears from the representation entirely. Yet both relations clearly hold. The "but" is not semantically vacuous—it signals a CONCESSION the tree cannot express.

Moore and Pollack identified this concurrent-relations problem as early as 1992. A single span of text may simultaneously serve multiple discourse functions: elaborating a previous claim while also contrasting with it, or providing evidence that also advances a temporal sequence. The tree constraint collapses this multiplicity into a single label.

A second limitation: RST makes no distinction between explicitly signaled relations and implicit ones. The temporal relation marked by "then" and an inferred causal relation with no overt marker receive identical treatment. There is no way to record that one relation has textual evidence you can point to while the other depends entirely on pragmatic inference.

Alternative Frameworks

Subsequent frameworks each addressed part of the problem while introducing new tradeoffs.

Segmented Discourse Representation Theory (SDRT), developed by Asher and Lascarides, permits graph structures rather than trees. Two units can participate in multiple relations simultaneously—both NARRATION and CONTRAST, for instance. Edges may cross; non-projective configurations are allowed. SDRT also permits units to nest within each other, further complicating the data model. The cost: SDRT distinguishes coordinating from subordinating relations rather than nuclei from satellites, losing RST's clear prominence hierarchy that made recursive summarization straightforward.

The Penn Discourse Treebank (PDTB) takes a fundamentally different approach. Relations are annotated as senses of their associated discourse connectives—words like "but," "because," "then." If a connective is present, the relation is explicit; if a connective could be inserted, the relation is implicit. PDTB also recognizes Alternative Lexicalizations (expressions like "that is" that mark relations without being canonical connectives) and other specialized signal types.

This connective-anchored approach enables reliable identification: annotators can point to the textual evidence licensing each relation. PDTB also employs a hierarchical label taxonomy (COMPARISON.CONCESSION.ARG2-AS-DENIER rather than just CONCESSION) and allows relations to connect discontinuous or overlapping segments. The cost: no document-level structure whatsoever. PDTB produces only local pairwise relations, not an overarching hierarchy. There is no way to represent that a paragraph elaborates a section, or that a sequence of sentences forms a coherent narrative block.

The Cognitive Approach to Coherence Relations (CCR) offers a psycholinguistic account, characterizing relations through five binary dimensions: polarity (positive/negative), basic operation (causal/additive), source of coherence (objective/subjective), implication order (basic/non-basic), and temporality (temporal/non-temporal). CCR treats connectives as processing instructions that guide readers toward the intended interpretation, with their absence requiring additional cognitive effort. The framework has empirical support from reading-time studies and corpus analyses. Like PDTB, however, CCR targets local relations without hierarchical document structure.

Discourse Dependency Structure translates RST's constituency trees into binary asymmetric dependencies between EDUs, aligning with syntactic dependency formalisms like Universal Dependencies. The conversion simplifies parsing and produces transparent structures, but most DDS data is derived from RST or other corpora rather than natively annotated.

Toward Synthesis

By the mid-2020s, the discourse parsing landscape comprised multiple frameworks, each with genuine advantages and genuine limitations. RST provided hierarchical structure with nuclearity but enforced a tree constraint that obscured concurrent relations. SDRT allowed graph structures and multiple relations but lost clear prominence marking. PDTB anchored relations to explicit signals but had no document-level organization. CCR offered psycholinguistic grounding with feature decomposition but targeted only local relations.

Some multilayer datasets enabled comparison—RST-DT and PDTB annotate overlapping Wall Street Journal material; the Potsdam Commentary Corpus added connective annotations to German RST trees—but no one had attempted a unified theory incorporating the advantages of multiple frameworks.

Enhanced Rhetorical Structure Theory is that attempt. The name signals backwards compatibility with RST, analogous to how Enhanced Dependencies extend Universal Dependencies in syntax. The framework retains RST's recursive tree structure and nucleus-satellite distinctions while incorporating:

Multiple concurrent relations between the same nodes
Non-projective, tree-breaking structures
Categorized discourse relation signals anchored to specific tokens
A hierarchical relation taxonomy
Explicit support for implicit and explicit marking

The primary motivation is not to improve scores on any particular NLP task, but to provide a more comprehensive representation of discourse relations—recovering relations that exist but that RST analyses cannot express, along with the textual rationale for identifying them.