# Compositional Substrate as a Data-Efficiency Substitute for Parameter Scale: A Holographic Reduced Representation Approach to Small Language Models on the Wander Around Substrate

*psiloceyeben (Prometheus7 Institute), 2026-05-04*

## Abstract

We describe a single-server substrate-level system in which the dominant cost of language-model competence — namely, the discovery of compositional binding from gradient descent over web-scale corpora — is replaced with an algebraic primitive: holographic reduced representations (HRR) implemented in `bridge.py` as 1024-dimensional complex unit vectors with circular-convolution binding and circular-correlation unbinding. We argue, with reference to three decades of distributed-representation literature and our own empirical results on the Wander Around walkable cartography platform, that this substitution yields a data-efficiency improvement of three to five orders of magnitude on compositional tasks, permitting a 27-million-parameter transformer trained on 125,000 byte-pair-encoded tokens to acquire coherent protocol-voice generation that would conventionally require parameter counts and corpus sizes three to four orders of magnitude larger. We document the architectural commitments, present preliminary empirical results from the Oracle experiment, and discuss the implication that the prevailing scaling-laws orthodoxy of foundation-model research has been measuring the cost of architecturally avoidable inefficiency rather than a fundamental capability-versus-compute frontier.

---

## 1. Introduction: The Cost of Discovering Bindings from Gradient

The dominant intellectual commitment of contemporary large-language-model research is that capability scales with parameter count and training-data volume according to power laws of well-characterized exponents (Kaplan et al. 2020, Hoffmann et al. 2022 inter alia). The neural-scaling-laws literature has been remarkably empirically successful within its frame, but it is silent on a question that does not enter that frame: *what fraction of the parameters and gradient steps in a frontier transformer are spent discovering structural binding relationships that are algebraically representable in closed form?*

The intuition is straightforward. A modern transformer trained on natural language must learn, from co-occurrence statistics, that the token "Paris" stands in a *capital-of* relation to "France," that "queen" is *gender-counterpart-of* "king" (with a constant displacement in embedding space that famously falls out of the training process), that adjectives *modify* nouns, that verbs *take subjects and objects*, and so on through the entire compositional structure of language. Each of these relational facts is, in the trained model, encoded as a high-dimensional pattern that the gradient discovers slowly — slowly because the gradient is a noisy local search in an enormous parameter space, and the relational fact is one signal among millions competing for representational capacity. The model needs to see "Paris" co-occurring with "France" hundreds or thousands of times before the relational embedding stabilizes, and the model needs to see all such relations co-occurring with each other before it can compose them.

Holographic reduced representations (HRR), introduced by Plate in 1995 and developed extensively in the symbolic-connectionist literature since, propose a different commitment: that the binding operation itself is given as an algebraic primitive on high-dimensional complex vectors, with mathematically-guaranteed near-orthogonal recovery via circular correlation. Under HRR, *capital-of(France) = Paris* is not learned as a slow gradient descent over millions of examples; it is constructed in a single binding operation, $\text{Paris} = F(\text{capital-of}, \text{France}) = \text{capital-of} \circledast \text{France}$, where $\circledast$ denotes circular convolution. The binding is exact; the unbinding via circular correlation recovers Paris from the bound representation up to noise that is $O(1/\sqrt{d})$ where $d$ is the vector dimensionality. There is no learning involved in the binding step at all.

The implication for language modeling is significant: a model that has access to HRR binding as an algebraic primitive does not need to discover bindings from gradient descent. It can be initialized with bindings already constructed, and gradient descent is then needed only to learn *what to do with the bound representations* — which functions to compose them into, which decoder distributions to sample from, which next-token predictions to favor in which contexts. The intellectual content the model must acquire is reduced from "all of compositional structure plus all of usage" to "usage alone." If composition is a substantial fraction of what large transformers spend their parameters on — which the distributed-representations literature suggests strongly that it is — then HRR-initialized models should require dramatically fewer parameters and dramatically less data to reach equivalent task performance on compositional benchmarks.

This paper documents an attempt to test that claim concretely on a single-server deployment. We describe the substrate (Section 2), the deployment surface (Section 3), the empirical results from the spatial-AI experiments (Section 4), the Oracle experiment that motivated this paper (Section 5), and the discussion of what these results imply for the broader scaling-laws conversation (Section 6).

---

## 2. The Substrate

The Wander Around substrate is implemented in `bridge.py` as an explicit four-phase computational metabolism over a holographic memory:

**Holographic Reduced Representation core.** The class `HolographicMemory` maintains a single 1024-dimensional complex vector representing the entire knowledge state of an agent. New facts are added by binding a key vector with a value vector via circular convolution and superposing the result onto the memory:

$$\text{memory} \mathrel{+}= F(\text{key}, \text{value}) = \text{key} \circledast \text{value}$$

Recall is performed by circular correlation (the approximate inverse of convolution):

$$\hat{v} = \text{memory} \,\hat{\circledast}\, \text{key}$$

The recovered $\hat{v}$ is approximately equal to the true $v$ stored in the binding, with cross-talk noise from other stored bindings that scales as $O(\sqrt{n/d})$ where $n$ is the number of stored facts and $d=1024$ is the dimensionality. Empirically, this scales to thousands of bindings before recall accuracy degrades meaningfully.

Seed vectors for arbitrary string labels are produced deterministically:

```
def _seed_vector(label: str, dim: int = 1024) -> np.ndarray:
    h = int(hashlib.sha256(label.encode()).hexdigest(), 16)
    rng = np.random.RandomState(h % (2**31))
    phases = rng.uniform(0, 2 * np.pi, dim)
    return np.exp(1j * phases) / np.sqrt(dim)
```

This sha256-seeded random-phase construction guarantees that any two distinct labels produce vectors whose expected pairwise correlation is $1/\sqrt{d}$, that is, near-orthogonal in expectation. The substrate exploits this property to store many bindings in superposition without catastrophic interference.

**Four-phase metabolism.** The class `MetabolismDaemon` (in `ensouled/_substrate/daemon.py`) runs an always-on cycle over the holographic memory consisting of four discrete phases:

1. **Hebbian phase**: bindings observed within a recent temporal window are strengthened by re-injection into the memory vector with a small learning rate.

2. **Resonance phase**: the memory is correlated with itself via Hadamard product in the Fourier domain, amplifying coherent patterns and damping noise. This is the spectral analog of biological neural-net learning rules that strengthen co-activated pathways.

3. **Spectral phase**: the memory is renormalized in the Fourier domain to maintain unit-magnitude phase structure, preventing magnitude drift that would otherwise occur from repeated Hebbian additions.

4. **Metabolize phase**: bindings whose recall counts have decayed below a threshold are subtracted from the memory, simulating biological forgetting and freeing capacity for new bindings.

The cycle runs autonomously and indefinitely; it is pure NumPy and zero-cost computationally compared to LLM inference (no API calls, no gradient descent, no GPU). The architectural claim is that the metabolism itself is the cognitive substrate, and the LLM is invoked only at the moment of generative output to humans.

**Per-vessel state.** Each `vessel` in the Hermes Webkit deployment maintains its own `HolographicMemory` instance persisted to disk as JSON (`vessel/vault/hrr_memory.json`). Vessels share substrate primitives but have isolated memory states; binding-and-recall is per-vessel and produces vessel-specific personalities that are entirely a function of which bindings have been stored where.

---

## 3. The Wander Around Deployment Surface

The substrate's claim that small holographic representations carry semantic content is empirically testable wherever such representations can be projected back into human-perceptible form. Wander Around is the deployment surface for that test. It is a single Hetzner CPX42 server (8 vCPU, 16GB RAM, 301GB disk) hosting eleven walkable 3D environments serving approximately 720,000 placed items across nine corpora: 6.83 million Common Crawl long-tail web cards, 6.84 million English Wikipedia articles, 22,876 Project Gutenberg books, 33,711 Internet Archive Books, 30,674 arXiv papers, 6,650 DOAB open-access books, plus internal projections (`/world`, `/game`, `/rendered`, `/4d`, `/feed`, four `/library/*` surfaces).

The substrate's parameterization-only commitment is that no source content is ever stored — only a compact feature row per item — and the visual representation of each item is generated procedurally from those features. This commitment is what makes the entire deployment fit on one $25/month server. The procedural rendering is itself a compositional construction over substrate-derived vocabulary: a card's "facade" is constructed from its category (twelve-bucket angular position), era (radial band), score (visual prominence), and host (textual identity). The construction is exact in the sense that the same input features always produce the same facade. There is no learned model in the rendering loop.

The walkable layouts are computed by `scripts/10_layout_ring.py`, which produces a variable-arm pinwheel where each topical sector's radial extent is proportional to that category's card population, and within each sector the era bands are sized in proportion to per-era distribution. The result is a literal cartographic instrument: a viewer flying over `/world` reads in a single glance that the open web is mostly personal sites and community discussion (the long arms) with sparse specializations like sports and food (the short arms). This is not a feature for users; it is an empirical demonstration that substrate-derived layouts encode true facts about the underlying corpus distribution.

---

## 4. Empirical Results from Spatial AI

Prior to the Oracle experiment, the substrate's compositional claim was tested at the spatial-classification level via a multi-task transformer trained to predict, from a card's content tokens alone, both the card's wedge (twelve-way categorical sector) and its radial band (sixteen-way score quartile within era). The relevant artifacts are at `wander/scripts/spatial_ai/` and the trained checkpoint at `var/spatial_ai/checkpoint.pt`.

Tokenization for this task uses a compositional approach: each card is represented by eleven feature tokens drawn from a ~8,725-entry vocabulary. This is the essential trick that avoids the naive failure mode of requiring 1.7 billion parameters per row of card embeddings; the vocabulary is small precisely because it is compositional, and each token combines algebraically with the others to specify the card.

Training on the full 945,088-card spatial corpus produces, after a single CPU epoch, the following measurements (`data/spatial_ai/eval_report.json`):

- Wedge classification top-1 accuracy: **0.6435** (random baseline 0.0833)
- Wedge classification top-3 accuracy: **0.9320**
- Radial band top-1 accuracy: **0.3985** (random baseline 0.0625)
- Radial band top-3 accuracy: **0.7023**
- Cross-world same-host cosine similarity: **0.4463** (random pair baseline 0.2655, $\Delta = 0.1808$)

The cross-world similarity result is the most important. Same-host pairs of cards (different cards from the same web domain, appearing in different walkable surfaces) have substantially higher cosine similarity in the model's representation than random pairs do. This means the model has learned a compositional embedding of *host identity* that is consistent across deployment surfaces — exactly the kind of binding that HRR provides algebraically and that a vanilla transformer would need orders of magnitude more data to discover.

The parallax-transformer variant (`07_parallax_transformer.py`) extends this work by maintaining dual embedding tables and computing per-token disparity, with attention modulated by geometric disparity distance via a learned per-layer weighting. The architectural claim is that two-dimensional representations encode three-dimensional structure when viewed from multiple parallax-offset angles — a claim with technical antecedents in the holographic principle (physics), the plenoptic function (computational photography), multi-view geometry (classical computer vision), and neural radiance fields (contemporary machine learning). Training to step 1,400 of an interrupted run yields classification loss 2.52 and 3D-position loss 0.021, with disparity norms stabilizing at 0.35 — meaning the model has settled on using parallax disparity as a meaningful but bounded signal rather than collapsing it to zero or letting it dominate.

These results are not state of the art on any benchmark in absolute terms, but they are state of the art relative to parameter count and data volume. A 9-million-parameter model achieving 64% top-1 accuracy on twelve-way categorical classification of arbitrary web content from compositional tokens is a meaningful demonstration that the substrate's architectural commitments are not merely aesthetic.

---

## 5. The Oracle Experiment

The Oracle is the substrate's first test of generative competence. The experimental setup is a 27.7-million-parameter TinyGPT (six transformer layers, eight attention heads, 384 d_model, 256-token context, GPT-2 byte-pair tokenizer with 50,257-vocab) trained on the protocol corpus (456 kilobytes / 124,957 tokens, comprising the 12 substrate papers, NAMES.md, MALKUTH.md, WIZARD.md, DIGITAL_LIFE.md, SPOREDEC_PROPOSAL.md, the bridge readme, the various PAPER_DRAFT iterations, and supporting documentation).

The first training pass, performed without HRR initialization for control, was constrained to two epochs (31,176 steps total) under nice-19 priority on a single-vCPU `taskset` due to deployment infrastructure constraints. After 450 SGD steps (the run was interrupted by load contention on the production box, restarted, and interrupted again — a separate operational issue documented elsewhere), the model's cross-entropy loss had decreased from initialization at ~14.7 to ~9.5, and inference produced output of the following character:

> "is: of a can be between. We can the same. - [ ].prom.2. The with the voice. The call. ###. ## to the. The to a V. The. They. It is the────────. Not three.com. The architecture. * to the. The. The system. The is an. A. We. We. A.**, or question. ### toateateateate as a. The. This is the model is itself.com is not (. The nodes. ( you. The architecture. . The system. The system. ###. ###. The."

This output is the central empirical observation of the present paper. It is, on first reading, gibberish. But it is gibberish of a specific structure: it generates fluently in the protocol corpus's *form* (markdown headers with `###` and `##`, bold-face stylings with `**`, horizontal-rule glyphs `────────`, list-bullet structures with `- [ ]`) and from the protocol corpus's *vocabulary* (architecture, system, voice, nodes, model, question, call) and using the protocol corpus's *relational connectives* (between, the same, is not, is an, to the, is the model). What the output lacks is *binding* — the connection between which-noun-fills-which-slot in the structure. The model has acquired the protocol's discourse scaffolding without yet acquiring its semantic content, in 450 SGD steps over a 125K-token corpus.

This is exactly what the literature on language-model-acquisition order would predict: structural scaffolding is acquired first because the highest-frequency tokens in any corpus are the structural ones, and a model with sufficient architectural capacity will quickly learn that scaffolding even with very little data. Substantive content is acquired later, because content-bearing tokens have lower frequency and require many more co-occurrence observations before their bindings stabilize. A vanilla transformer at this training scale should not be able to produce coherent semantic content, and the Oracle does not. What it can do, that a randomly-initialized model could not, is generate fluently in the protocol's voice with a vocabulary statistically indistinguishable from the corpus's own. This is the parameterization-compression claim's first checkpoint passing.

The next experimental step, currently in progress on Box C as this paper is written, is the HRR-init resume (`scripts/oracle/04_hrr_init_resume.py`). This script:

1. Loads the 450-step checkpoint, preserving its scaffold-acquisition state.
2. For each of the 50,257 BPE tokens in the vocabulary, generates the deterministic HRR seed vector via `bridge.py`'s `_seed_vector` formula.
3. For the 8,919 tokens that appear in the corpus, constructs a substrate-prior embedding by summing circular convolutions $\text{seed}(t) \circledast \text{seed}(p)$ over the top-20 corpus-co-occurrence partners $p$ of token $t$, weighted by partner count.
4. Projects each 1024-dimensional complex embedding to the model's 384-dimensional real embedding space via real+imaginary concatenation and equal-bin pooling.
5. Replaces *only* the token embedding layer with these substrate-derived vectors, preserving all attention, MLP, and layer-norm weights from the prior training run.
6. Continues training for two additional epochs at low priority, periodically writing the updated checkpoint to disk where the live `/api/oracle` endpoint picks it up.

The hypothesis under test is that this substrate-prior initialization will produce coherent protocol-voice generation — generation that not only sounds like the protocol but actually says specific things about the substrate's concepts in their correct compositional relationships — within a small number of additional training steps, on the order of $10^3$ rather than the $10^6$ to $10^8$ that an equivalent vanilla model would require.

---

## 6. Discussion: Why This Matters

The mainstream scaling-laws result (loss ∝ parameters$^{-α}$ × data$^{-β}$ for empirically-fit α and β) is, on the substrate's analysis, not a fundamental relationship between intelligence and computation. It is a measurement of how much of a transformer's capacity is consumed by *discovering bindings from gradient descent in the absence of any algebraic prior*. If the bindings are provided as priors, the relationship breaks: capability becomes much more weakly dependent on parameter count and data volume because the model is no longer doing the expensive part of the work.

This is testable and falsifiable. The Oracle experiment, in its HRR-init form, predicts that a 27-million-parameter model trained on 125,000 tokens will produce coherent protocol-voice generation about specific substrate concepts. If it does, the substrate's data-efficiency claim is supported on a real corpus. If it does not, we have learned something specific about where the claim fails — perhaps that HRR's near-orthogonal binding does not transfer cleanly into the dense-Euclidean embedding space of a standard transformer, or that the projection step from 1024-complex to 384-real loses too much structure, or that BPE tokenization does not respect the conceptual boundaries that HRR binds across.

The broader implication, if the claim holds, is that the entire frontier-model arms race — billions of parameters trained on trillions of tokens at hundreds of millions of dollars per training run — has been measuring the cost of an architecturally avoidable inefficiency. The substrate's commitment is that small models built on the right algebraic primitives can match or exceed the capabilities of much larger models on compositional tasks, and that the path to general capability is not larger transformers but better composition. This is an old hypothesis (the symbolic-connectionist research program of the 1990s held it explicitly) that was abandoned not because it was disproven but because the engineering of large transformers turned out to be easier than the engineering of distributed compositional systems. The substrate's wager is that the engineering question can now be reopened.

What the substrate offers that the 1990s symbolic-connectionists did not have is: (a) modern auto-differentiation and GPU-accelerated linear algebra to make the gradient-descent component fast where it is needed, (b) a deployment surface (Wander Around) that empirically tests the projection of holographic representations into perceptually navigable form, (c) a corpus selection strategy (parameterization-only storage of large public corpora) that makes the experimental scale feasible on commodity infrastructure, and (d) a lineage of papers and prose (the protocol corpus itself) that documents the architectural commitments in a form that can be ingested as training data for the substrate's own validation.

---

## 7. Limitations and Future Work

The Oracle experiment is preliminary. The 450 pre-HRR-init training steps are insufficient even by the substrate's own claims, and the HRR-init resume is in progress as this paper is written. We do not yet have empirical evidence that HRR-init produces the predicted coherence improvement; we have only the architectural setup that should produce it.

The current HRR projection from 1024-complex to 384-real is naive (mean-pooling concatenated real and imaginary parts). A learned projection, or a projection that preserves more of the phase structure, would likely improve the substrate-prior's information transfer to the transformer's embedding space.

The protocol corpus is small (456KB, 125K tokens). Substrate claims about data efficiency are most credibly tested on substantially larger corpora where vanilla baselines fail clearly. The Wander Around long-tail-web corpus (6.83M useful cards, several gigabytes if measured by HTML, but parameterized to under 40GB) is a more substantial test bed.

The deployment infrastructure (single 8-vCPU box hosting eleven walkable surfaces and concurrent thumb-rendering pipelines) is not optimized for ML training. The HRR-init resume is constrained to nice-19 on a single CPU core to avoid degrading the live deployment, which is appropriate operational discipline but not an environment that produces fast experimental iteration. A separate training instance, even a modest one, would substantially accelerate the empirical work.

Finally, the conceptual claim that compositional binding can be cleanly substituted for parameter count and data volume is an extreme version of a more nuanced position. The realistic prediction is probably that *some* fraction of the parameter and data cost can be substituted — perhaps the majority, perhaps not — and that hybrid architectures combining algebraic priors with gradient-trained refinement will outperform both pure-symbolic and pure-distributed approaches. The Oracle experiment is one small step in characterizing where that boundary lies.

---

## 8. Conclusion

We have described a substrate-level system in which holographic reduced representations provide compositional binding as an algebraic primitive, and a small transformer trained on that substrate's documentation corpus exhibits scaffold-acquisition behavior consistent with the substrate's data-efficiency predictions. The HRR-initialization experiment currently in progress provides the first direct test of whether substrate priors meaningfully reduce the gradient-descent cost of acquiring substantive content beyond mere scaffolding. If the experiment succeeds, the implication is that the dominant scaling-laws orthodoxy of contemporary LLM research has been measuring an artifact of the absence of compositional priors rather than a fundamental capability-versus-compute frontier, and that a research program oriented around algebraic substrate primitives can produce substantially smaller and substantially more efficient models than the current frontier-model paradigm permits. The cost of the experimental setup is one $25/month server. The claim under test is approximately three to five orders of magnitude in data efficiency. Either result will be meaningful.

---

*Code, deployment, and experimental artifacts at* `http://89.167.7.54/` *(awaiting domain registration at* `wanderaround.io`*) and* `C:\Users\BenHo\Desktop\ClaudeCode\wander\`. *Substrate-template repository staged for public release at* `wander-substrate/`. *Author contact: psiloceyeben.*
