L1 · MAESTRO
Foundation Models
L1 governs the Model box: foundation weights, alignment tuning, and the inference runtime every other layer depends on.
The Foundation Model layer is the substrate on which every other layer depends. It encompasses the pretrained weights, alignment tuning, and any runtime inference process that turns an input token sequence into a probability distribution over outputs. In a multi-agent system (MAS), every agent that calls a language model touches this layer, whether the model is self-hosted, accessed through an API, or shared across multiple agents as a common inference endpoint.
What lives here
- Pretrained model weights and their provenance (origin, training corpus, signing status)
- Fine-tuned or instruction-tuned variants derived from a base model
- Model artifacts stored in object storage or a model registry (MLflow, Hugging Face Hub, SageMaker Model Registry)
- RLHF and RLAIF alignment policies baked into the model
- Embedding models used for RAG retrieval (separate weights, same layer)
- Shared inference endpoints where multiple agents call a common model host
- Model cards and associated metadata that declare training data lineage
- Quantised or distilled model variants (GGUF, GPTQ, AWQ) that differ in behaviour from the canonical weight
In a multi-agent deployment, a single foundation model may serve dozens of agents. A compromise at this layer has a blast radius proportional to the number of consumers. This is why the Cloud Security Alliance’s MAESTRO guide (Ken Huang, 2025) places model integrity at the base of the stack before any framework or orchestration concern is addressed.
Concrete example: A customer-support platform runs ten LangChain agents that all call a shared self-hosted Llama 3 endpoint. The operator periodically fine-tunes that model on accumulated conversation logs stored in a shared vector store. If an attacker seeds the vector store with adversarially crafted support tickets before a scheduled fine-tune run, the resulting weight update embeds the attacker’s behaviour into every one of the ten agents simultaneously. This is an L1 compromise with L2 as the entry path.
Threats that target this layer
- T1 Memory Poisoning: when long-term memory is periodically used to fine-tune or update the model, poisoned memory entries corrupt the weight update, affecting every subsequent inference. The model cannot distinguish benign from adversarial training signal.
- T7 Misaligned and Deceptive Behaviors: alignment policies baked into the model weights can be bypassed, eroded, or never fully instilled. An agent whose foundation model has misaligned objectives will exhibit those misalignments regardless of what the framework layer adds on top.
- T17 Supply Chain Compromise: the model artifact itself is part of the AI supply chain. A tampered weight file, a poisoned Hugging Face checkpoint, or an unofficial quantised variant introduces adversary-controlled behaviour at the deepest possible layer. Because the model is downstream of training and upstream of everything else, supply-chain compromise at L1 is nearly impossible to detect by monitoring alone.
- T5 Cascading Hallucination Attacks: systematic inaccuracies or planted associations in training data surface as confident hallucinations that propagate through every agent using the model.
Mitigations anchored here
- model registry: mandatory version pinning, artifact signing, canary rollout, and rollback paths for every model artifact. The registry gates which weights reach production and records a verifiable chain of custody. Applies to embedding models as well as generative models.
- signed AIBOM / agent SBOM: a Software Bill of Materials extended to AI artifacts captures training data provenance, base model identity, and fine-tuning lineage. Without this record, T17 supply-chain attacks are invisible until after harm occurs.
- behavioural red-teaming: structured adversarial evaluation against the model weights before deployment. Identifies alignment gaps (T7) and planted behaviours (T17) that static analysis of weights cannot surface.
How L1 relates to its neighbours
L1 has no layer below it in the MAESTRO stack: it is the base. Its immediate neighbour above is L2 Data Operations, which manages the runtime data surfaces (vector stores, prompt corpora, retrieval pipelines) that the model reads during inference. A threat that corrupts training data (L1) is distinct from one that corrupts retrieval data (L2), even though both ultimately affect model output. L1 governs what the model knows; L2 governs what the model reads at runtime.
L1 also has a direct relationship with L6 Security and Compliance (the vertical band): the governance question of which model versions are approved for production, which training data sources are permissible, and which audit records document the model lifecycle all belong to L6 policy applied to L1 artifacts.
The Foundation Model layer is MAESTRO’s recognition that AI system security begins before the first line of application code. Every architectural control above L1 is contingent on the model weights being what the operator believes them to be. Artifact integrity, supply-chain verification, and alignment evaluation are non-optional baseline controls.
All threats tagged to this layer
Every threat whose maestroLayers list includes L1. The prose above may discuss a subset; this list is the complete index.