An eRST graph—with its colour-coded signals, hierarchical tree structure, and blue secondary edges—looks complex at first glance. It decomposes, however, into three well-defined components, each with clear constraints and annotation criteria. Understanding these components is the key to understanding what eRST represents and why.
The Primary Tree
The primary tree is the backbone of an eRST analysis. It is a directed, single-rooted, projective, labelled constituent tree over non-overlapping Elementary Discourse Units—essentially an RST tree with certain constraints made explicit.
Every token in the document belongs to exactly one EDU. EDUs coalesce into complex discourse units, which in turn join with others to form larger units, recursively up to a single root spanning the entire document. Each node carries a unique relation label from a closed inventory (the GUM corpus implementation uses 32 labels, including familiar types like CAUSE, CONCESSION, ELABORATION, ATTRIBUTION, and the technical SAME-UNIT for discontinuous EDUs).
Nodes are classified as satellites or nuclei. At every level of the tree, at least one child must be a nucleus—the more prominent or necessary unit—while satellites elaborate, support, or contextualize the nucleus. This distinction is what enables recursive summarization: traverse the tree, keep nuclei, discard satellites, and the result is a coherent summary at any desired granularity.
eRST makes several constraints explicit that previous RST implementations left implicit:
- No empty hierarchy: every non-terminal node has at least two children. Unary derivations are forbidden.
- Strictly ordered hierarchy: no two satellite children for the same nucleus can be unordered. If a nucleus has two satellites, one must scope over the other.
- Explicit tokenization: EDU contents are tokenized to enable signal anchoring at the word level.
Building the primary tree follows standard RST criteria. Propositions are grouped by the function they serve, with more prominent units assigned nucleus status. Relation labels are defined based on their intended effect on the reader—a group of EDUs supplying evidence for a claim might be labelled EXPLANATION-EVIDENCE, while a conceded counterpoint would receive ADVERSATIVE-CONCESSION.
The critical point: the primary tree in eRST should be the same tree that plain RST would produce. No compromises are made to accommodate the additional machinery. Nuclearity and structure are determined by discourse function, full stop.
Secondary Edges
The tree constraint that defines RST cannot represent certain relations that genuinely hold in text. Secondary edges are the mechanism for recovering these relations without abandoning the tree's advantages.
A secondary edge may connect any two nodes already present in the primary tree. Unlike primary edges, secondary edges are not constrained by projectivity or acyclicity. They can cross branches, create cycles, link nodes at arbitrary distances in the hierarchy. They represent relations the tree structure cannot express.
Four constraints govern secondary edges:
-
Signal requirement: A secondary edge may only be added if it is supported by a sufficient signal. This is the core licensing condition.
-
No duplicate paths: Any two nodes can have at most one secondary edge in each direction. Combined with the primary edge, this means at most three edges can connect a given node pair.
-
No self-loops: A node cannot have a secondary edge to itself.
-
No new nodes: Secondary edges connect existing nodes. They cannot require defining additional spans not already in the tree.
The first constraint is the most important. Secondary edges are restricted to clearly signaled cases because agreement on discourse relations is already difficult. Two types of signals license secondary edges:
Orphan discourse markers: A DM like "but" or "then" for which no corresponding relation exists in the primary tree. The annotator determines the primary tree structure based on discourse function; if a DM remains without a corresponding edge, it becomes an orphan, and the relation it signals can be expressed via a secondary edge.
Unambiguous morphosyntax: Reported speech not already captured in a primary ATTRIBUTION relation, or adnominal clauses not already captured by an ELABORATION or PURPOSE relation. If a sentence contains a cognition verb like "know" with a complement clause, but the primary tree analyzes the nucleus differently, the syntactic pattern can license a secondary ATTRIBUTION.
Consider a concrete example. A passage is analyzed with units [23–27] forming an EVALUATION of how a rainy day is tolerable. The annotator perceives this evaluative function as the most prominent relation. But unit [27] contains the word "but," signaling a CONCESSION that the primary tree cannot represent—the EVALUATION and CONCESSION would require the same units to participate in different relations with different nuclearity directions. The orphan "but" licenses a secondary CONCESSION edge.
Notice that secondary edges carry direction but not nuclearity. A secondary CONCESSION still has a conceded part and a claim it contrasts with, but it does not override the prominence structure established by the primary tree. This is the key design decision: eRST allows concurrent relations while preserving unambiguous nuclearity. The primary tree still supports recursive summarization; secondary edges capture additional relational information without disturbing it.
Annotation practice follows from this design. Annotators complete the entire primary tree before adding secondary edges. If both were annotated simultaneously, annotators might be tempted to select the unmarked relation as primary specifically to make an orphan DM available for the second relation. By requiring the primary tree first, eRST ensures that nuclearity is determined by discourse function alone.
The relation inventory for secondary edges is the same as for the primary tree, though multinuclear and satellite variants collapse—secondary edges indicate only direction, not prominence. The result is a partial remedy to criticism that RST's single nuclearity choice cannot express everything relevant about discourse structure.
The Signal Taxonomy
PDTB anchors relations to connectives; eRST generalizes this anchoring to a comprehensive signal taxonomy. Signals provide explainable rationales for relation identification, linking each edge to the textual evidence that motivates it.
Discourse markers are the most familiar signal type: coordinating conjunctions ("but," "and"), subordinating conjunctions ("because," "although"), adverbials ("however," "then"), prepositional phrases ("in addition," "as a result"). Discontinuous markers like "if...then" are included. DMs are always anchored to specific tokens.
Beyond DMs, eRST recognizes seven additional signal categories, following and extending the taxonomy from the RST Signalling Corpus:
Graphical signals include colons, dashes, semicolons, parentheses, quotation marks, question marks, layout features like headings (identified by font and positioning), and items in sequence (numbered lists, bullet points). Some are token-anchored; others (like layout) are not.
Lexical signals encompass alternate expressions ("that is," "in other words") and indicative words or phrases. An evaluative adjective like "remarkable" appearing in an EVALUATION relation is a lexical signal—not a connective, but a word that indicates the relation type.
Morphological signals include mood (imperatives in MOTIVATION relations) and tense sequences (past followed by present marking SEQUENCE).
Numerical signals involve matching counts: "Two reasons" preparing for a two-item list.
Reference signals capture cohesion through anaphora: personal pronouns linking to antecedents, demonstratives ("this person," "that approach"), comparative expressions ("another," "other"), and propositional reference ("this encounter" referring back to a described event).
Semantic signals include antonymy ("cheap" vs. "expensive" in CONTRAST), attribution sources (the entity performing speech or cognition acts), lexical chains (related but non-coreferring words like "power" and "influence"), meronymy (part-whole relations), negation (a predicate appearing positive in one unit and negated in another), and repetition or synonymy.
Syntactic signals encompass infinitival and relative clauses, reported speech constructions (a verb plus complement clause), participial clauses, parallel syntactic constructions, subject-auxiliary inversion (in conditions), and modified heads (a noun whose modifier conveys the relation, as in "a plan to win" where "plan" is the modified head of a PURPOSE relation).
A single token can serve multiple signals for different relations. In a passage about "Kiara Perkins," that name might function simultaneously as a semantic attribution source (for an ATTRIBUTION relation) and as a personal reference signal (for an ELABORATION relation). Signals are typed and anchored, allowing precise extraction.
A Complete Example
The Zeldes et al. paper includes an annotated fragment that illustrates the full eRST machinery. Units [151–154] provide BACKGROUND to a question in [155]: "Why did she so badly want to attend?" The BACKGROUND describes Kiara Perkins's situation; the QUESTION asks about her motivation.
The QUESTION relation carries three signals:
- A lexical signal: "Why"
- A graphical signal: the question mark
- A syntactic signal: subject-auxiliary inversion, anchored to "did"
The BACKGROUND relation has a personal reference signal: "Kiara Perkins...she," indicating the background relates to this person through anaphoric reference.
Within the BACKGROUND, unit [151] contains an ATTRIBUTION signaled by three elements:
- A semantic attribution source: "Kiara Perkins"
- A lexical signal: the verb "admitted"
- A syntactic signal: the complement clause headed by "willing" (reported speech construction)
A PURPOSE-GOAL clause in [153] is marked by a syntactic signal: a to-infinitive. A CONTRAST relation is marked by the DM "but" plus a semantic lexical chain: "attend...attend" (repetition indicating the contrast operates over the same topic).
Finally, an orphan "then" appears in the text—a DM with no corresponding primary relation. It licenses a secondary SEQUENCE edge, capturing temporal ordering that the primary tree's structure does not represent.
This single example demonstrates how eRST integrates tree structure, secondary edges, and multi-typed signals into a unified representation. The primary tree encodes hierarchical organization and prominence. Secondary edges capture concurrent relations licensed by orphan markers. Signals provide token-level rationales for every edge, distinguishing explicit from implicit relations and recording the textual evidence that motivates each analysis.
Complexity and Effort
The additional machinery adds only modest computational overhead. Secondary edge search, in the worst case, is quadratic in the number of nodes—for each node pair, check whether a secondary edge is licensed and, if so, which relation it expresses. Signal detection reduces to token-wise multilabel classification: each token may carry zero or more signal types, each associated with a specific edge.
Annotation effort is higher than for plain RST, though the cost of primary trees is inherited rather than introduced. The additional eRST annotation can be partially automated: NLP tools for connective detection achieve high performance for English, and signals unalignable to any primary edge serve as indicators that secondary edges may be needed. The framework was designed to be extended incrementally, leveraging existing RST corpora and NLP pipelines.