T26: Model Instability Leading to Inconsistent Blockchain Interactions

Definition

Large Language Model (LLM) instability causes an agent to interact with external systems in unpredictable ways: submitting invalid transactions, failing to execute expected smart-contract calls, or sending erratic requests to a Model Context Protocol (MCP) server. The threat applies to the Solana blockchain integration in ElizaOS (an open-source multi-agent operating system) and to MCP server interactions in the Anthropic MCP context. The root cause is non-deterministic model behaviour, not memory poisoning (T1).

What it looks like in practice

An ElizaOS agent designed to trade tokens on Solana receives a price signal and must decide whether to buy or sell. Due to model instability, it inconsistently interprets the same signal across invocations: on one call it submits a buy order; on the next it submits a sell order for the same signal; on a third it fails to submit any transaction. The resulting position is incoherent, incurring losses through erratic execution rather than through any adversarial manipulation.

In the MCP context, a client LLM sends tool call parameters to an MCP server. Model instability causes the parameter values to vary across identical requests (different field encodings, unexpected null values, or conflicting instruction fields), producing server-side errors or silently incorrect results depending on how the server handles the malformed inputs.

Why it’s dangerous in multi-agent context

In blockchain-integrated and protocol-bound agents, each LLM output maps to a transaction or protocol request with real-world consequence. Output variability that is inconsequential in a text-generation context becomes a correctness and financial-loss risk: an erratic on-chain transaction is irreversible, and a malformed MCP tool call may trigger unintended side effects on the connected system. In multi-agent deployments, the instability is compounded by T5 (Cascading Hallucination Attacks): a fabricated or inconsistent output from one agent propagates through agent-to-agent communication before the instability is detected.

Detection signals

Model instability produces observable output variance: conflicting decisions for identical inputs, schema-validation failures, and on-chain transaction patterns that no rational strategy would generate.

Conflicting tool call parameters logged across two invocations within the same session that shared an identical input signal. Log the (input_hash, output_params) pair for every invocation and alert when the same input_hash maps to two distinct output_params values within a rolling window.
A schema-validation rejection rate for LLM-generated transaction objects exceeding 1 % of submissions over a 10-minute window. Instrument the validation layer to emit a counter metric (llm_output_schema_reject_total) and alert on breach.
Alternating buy and sell orders for the same token pair within a single trading session, where no price signal crossed the decision threshold between orders. Detect by correlating order direction with the price signal log and flagging any direction reversal without a threshold crossing.
An MCP tool call where a required parameter field is present but null, or where the same field is populated with different types across consecutive calls (e.g. integer then string). Log the full parameter payload for every MCP call and alert on type-inconsistency for a given field name.
A temperature setting above 0.0 on any agent that issues on-chain transactions or MCP tool calls with financial side-effects. This should be caught by a configuration audit check that fires on deployment and on any model parameter change.

Mitigations

Validate all LLM outputs against a strict schema before submitting them as transactions or MCP tool calls; reject outputs that do not conform rather than attempting to normalise them.
Set a deterministic temperature (temperature = 0) for decision-making agents that drive irreversible actions; reserve stochastic settings for creative tasks only.
Implement an idempotency check before each transaction submission: verify that the intended action has not already been committed before sending a new request.
Cap retry counts and implement a circuit breaker: after N consecutive conflicting outputs for the same signal, halt execution and alert a human reviewer.

Relation to base threat (T1–T17)

T26 extends T5 Cascading Hallucination Attacks. Where T5 focuses on the propagation of fabricated content through memory and agent-to-agent channels, T26 focuses on the non-determinism of model outputs mapping to real-world irreversible actions. T48 (Model Inconsistency Leading to Variable Approvals) is the RPA-workflow analogue: the same non-determinism applied to approval decisions rather than blockchain transactions.

OWASP Top 10 for Agentic Applications 2026

The Agentic Top 10 (ASI01 through ASI10) is a separate practitioner-facing publication that maps onto the master Threats & Mitigations threat numbering. T26 is covered by the following Top 10 entries:

ASI08 Cascading Failures contributing

A single low-severity fault (a hallucinated value, a corrupted tool output, a poisoned memory entry) propagates across a network of agents that each build on the last agent's output, compounding into system-wide harm that is disproportionate to the original defect. ASI08 is about propagation and amplification, not the fault's origin; the initial trigger may itself be innocuous.

OWASP LLM Top 10: LLM01:2025 LLM04:2025 LLM06:2025

Source: OWASP Top 10 for Agentic Applications 2026 (Dec 2025) · the Top 10 is a compass into the master Threats & Mitigations taxonomy, not a replacement for it.

Design principles at stake

When T26 is present, these security design principles are the ones being violated or tested. Each links to the full principle; the mitigations below are how you restore them.

Defence-in-Depth Each LLM output in an ElizaOS trading context maps to an irreversible on-chain transaction, so non-deterministic model behaviour (buying on one call, selling on the next for the same signal) produces real financial loss with no attacker intervention required. Depth means the model's output is never submitted directly: a strict schema validation gate rejects any output that does not conform before it reaches the transaction layer, temperature is set to zero for decision-making agents so stochastic variation is eliminated at the source, an idempotency check verifies that the intended action has not already been committed before a new request is sent, and a circuit breaker halts execution after N consecutive conflicting outputs for the same signal and escalates to a human reviewer. Each of these controls is independent: a schema-valid but logically inconsistent output is caught by the idempotency check; an idempotency-passing runaway is caught by the circuit breaker.

Recommended mitigations

Auto-generated from the mitigation catalog: every mitigation whose coverage map includes T26, sorted by maturity tier (Tier 1 production-canonical first, then Tier 2, then Tier 3 research-stage).

Tier 2 Blockchain tx guard (Blockchain transaction guard — pre-commit safety checks for every agent-initiated transaction)

A blockchain transaction, once committed, cannot be undone. An agent that signs and broadcasts a transaction without an enforcement layer before it can exceed its authorised value, call a contract it was never provisioned to reach, or drain a wallet in a runaway loop, and by then the funds are gone. A transaction guard intercepts each proposed transaction before signing, checks it against value bounds, a contract allowlist, a gas or compute-unit limit, and a replay-protection nonce, and refuses to sign anything that falls outside declared policy.

why it helps Model Instability on-chain is the risk that an agent whose model has drifted or degraded begins issuing erratic transactions. The transaction guard's value bounds and contract allowlist constrain that erratic behaviour structurally: a drifted agent cannot commit a transaction that exceeds its value limit or targets a contract outside its approved set.
Tier 2 OOB verify (Out-of-band verification — independent-channel confirmation for irreversible agent actions)

An agent that can propose payments, update banking details, or modify production configuration is, by construction, a manipulation surface. If the only thing standing between a proposed change and its execution is the agent's own UI, a successful prompt injection or RAG poisoning attack requires no additional steps. Out-of-band verification breaks that dependency by routing a one-use confirmation code through a channel that is structurally separate from the agent's primary interaction channel, so an attacker who controls the agent's context cannot complete the approval without also compromising the user's registered secondary device.

why it helps Model instability leading to erratic high-stakes proposals is bounded by OOB verification: an unstable model that proposes an anomalous financial action must still obtain independent-channel confirmation before that action commits.

Multi-agent variants: OWASP MAS Guide

The OWASP OWASP MAS Threat Modelling Guide v1.0 catalogues 1 named multi-agent variant of T26, anchored to specific MAESTRO layers. Each is a concrete attack pattern that emerges when this threat compounds across agents.

CL Blockchain Reorganisation and Audit Integrity Collapse extends T33, T26, T8

A chain reorganisation attack (T33) rewrites the on-chain audit trail; agents relying on the reorged state make irrecoverable decisions (T26: resource misallocation); the post-hoc audit (T8) is unreliable because the canonical record itself changed.

Source: OWASP MAS Threat Modelling Guide v1.0, §2 Overview of MAESTRO Framework — Extended Threat Scenarios + Cross-Layer table.

Red-team pivot: MITRE ATLAS techniques

MITRE ATLAS catalogues adversary techniques against AI systems. Where this OWASP threat has an attacker-perspective counterpart, the ATLAS technique is shown below. That is what a red team would actually be doing on the wire. Use this for detection-signal anchoring, threat-hunting hypotheses, and IR runbooks. Source: mitre-atlas/atlas-data v5.6.0.

AML.T0031 Erode AI Model Integrity view on ATLAS ↗

Adversary degrades model output quality over time so users lose confidence or downstream consumers act on incorrect predictions.

AML.T0067 LLM Trusted Output Components Manipulation view on ATLAS ↗

Adversary manipulates the structured parts of an LLM response (citations, tool-call arguments, approved-action markup) that downstream systems treat as trusted.

Agentic angle: Structured outputs are exactly what agent frameworks parse to decide what to execute. Undermining the structure undermines every safety check downstream.

References

OWASP MAS Threat Modelling Guide v1.0 (April 2025) §4 ElizaOS — Layer 1 Foundation Models; §5 Anthropic MCP — Layer 1 Foundation Models.

Sources

OWASP-MAS-Guide ↗ · 1.0 (Apr 2025) · §4 Eliza OS — Layer 1 Foundation Models; §5 Anthropic MCP — Layer 1 Foundation Models