Tagged: agent-sprawl
Governing Agent Boundaries in .NET. Not Agents.
Post 9 of the AgenticOps series argued that agent sprawl governance starts at the boundary, not the agent. This post implements that claim in a .NET stack: C#, Microsoft Agent Framework, ML.NET, PostgreSQL, and Vue.js.
—
The Problem
A .NET platform grows agents organically. A triage agent classifies inbound work items. A ranking agent scores them by priority. A summarization agent compresses context for the daily control task. An extraction agent pulls candidate work items from external signals. Each agent is individually reasonable. Same team, same framework, same infrastructure. And none of them have governed boundaries between them.
The triage agent writes a classification. The ranking agent reads it. What validates the handoff? Nothing. Ranking trusts whatever triage wrote. If triage hallucinates a category that does not exist in the scoring model, ranking silently produces garbage scores. The failure is invisible because both agents completed successfully. The boundary between them had no ring.
This compounds with every agent added. Five agents with four boundaries and internal governance is not five governed agents. It is an ungoverned system that happens to have five well-scoped components.
—
Why It Breaks
Microsoft Agent Framework makes it easy to define agents with clean internal governance. The AgentChat orchestration model handles turn-taking, tool invocation, and termination conditions. The framework governs the interior of each agent. It says nothing about what happens between agents when one agent’s output becomes another agent’s input outside of a chat.
In a typical .NET implementation, the handoff looks like this.
// Triage agent writes result to the databasevar classification = await triageAgent.InvokeAsync(workItem);await db.WorkItems.UpdateClassification(workItem.Id, classification);// Ranking agent reads the result and scoresvar ranked = await rankingAgent.InvokeAsync(workItem);await db.WorkItems.UpdateScore(workItem.Id, ranked.Score);
Both agents are governed internally. Triage has scoped tools and a defined prompt. Ranking has its own tool set and scoring model. But the handoff, the moment classification leaves triage and enters ranking, is raw. No schema validation. No ring. No gate. If the triage agent returns an unexpected classification, the ranking agent consumes it without complaint.
Agent frameworks govern agent interiors. Boundary governance is the developer’s responsibility. When nobody builds it, the boundaries are open by default. Each new agent adds new boundaries. The governance gap grows with the agent count.
ML.NET adds a second dimension. A trained model that scores work item priority is deterministic given its inputs. But when those inputs come from an upstream stochastic agent, the deterministic model inherits the upstream variance. Garbage classification in, confidently wrong score out. The ML.NET model cannot tell you its inputs were hallucinated. It will score them with the same confidence as valid inputs.
This Looks Like RPA Orchestration. It Is Not.
The pattern of contract, validate, route is decades old. ESBs enforced message schemas between services. RPA orchestration platforms validated handoffs between bots. API gateways check request payloads against OpenAPI specs. If the fix is “validate data at the boundary,” enterprise middleware solved this twenty years ago. So what is different?
The difference is what crosses the boundary.
An RPA bot is deterministic. Bot A always returns the same shape with the same value space. If the schema passes, the content is correct. The bot does not invent new categories. It does not return a structurally valid payload containing a value it fabricated. Schema validation is sufficient because the output space is closed. Every possible output is known at design time.
An AI agent is stochastic. The triage agent can return a structurally valid JSON object with a category field that contains a value no one anticipated. The schema passes. The JSON is well-formed. The category is a string. But the string is “enhancement” and the downstream scoring model has never seen that value. Schema validation caught nothing because the violation is semantic, not structural.
This is why the boundary contract checks three things instead of one. Structure: does the payload match the schema? Domain: is the content within the known value space? Confidence: does the source agent trust its own output enough to skip human review? RPA boundaries only needed the first check. Agent boundaries need all three because the output space is open.
The confidence check is the sharpest difference. RPA bots do not have confidence scores because they do not make probabilistic decisions. They execute scripts. An AI agent that classifies a work item with 0.52 confidence is telling you it is nearly guessing. That signal exists at the boundary and nowhere else. If you do not check it there, the downstream system consumes a guess as a fact.
The infrastructure pattern is old. The failure mode it defends against is new. Deterministic boundaries protect against malformed data. Stochastic boundaries protect against plausible hallucinations. The plumbing looks the same. The threat model is fundamentally different.
—
The Fix
The fix is a boundary contract enforced at every handoff between agents. The contract checks structure, domain, and confidence. In .NET, this is an interface and a middleware pattern.
The Boundary Contract
Every agent-to-agent handoff passes through a boundary. A boundary has a schema, a validator, and a log entry.
public interface IBoundaryContract<T>{ string SourceAgent { get; } string TargetAgent { get; } JsonSchema Schema { get; } BoundaryResult<T> Validate(T payload);}public record BoundaryResult<T>( bool IsValid, T Payload, string[] Violations, DateTimeOffset Timestamp, string SourceAgent, string TargetAgent);
The schema is not optional and not advisory. It is a JSON Schema definition that the payload must satisfy before the target agent receives it. The validator checks structural compliance, domain constraints, and known invalid states.
public class TriageToRankingContract : IBoundaryContract<WorkItemClassification>{ public string SourceAgent => "triage-agent"; public string TargetAgent => "ranking-agent"; public JsonSchema Schema => WorkItemClassification.JsonSchema; public BoundaryResult<WorkItemClassification> Validate( WorkItemClassification payload) { var violations = new List<string>(); if (!KnownCategories.Contains(payload.Category)) violations.Add( $"Unknown category '{payload.Category}'. " + $"Valid: {string.Join(", ", KnownCategories)}"); if (payload.Confidence < 0.0 || payload.Confidence > 1.0) violations.Add( $"Confidence {payload.Confidence} outside [0.0, 1.0]"); if (payload.Confidence < MinConfidenceThreshold) violations.Add( $"Confidence {payload.Confidence} below threshold " + $"{MinConfidenceThreshold}. Requires human review."); return new BoundaryResult<WorkItemClassification>( IsValid: violations.Count == 0, Payload: payload, Violations: violations.ToArray(), Timestamp: DateTimeOffset.UtcNow, SourceAgent: SourceAgent, TargetAgent: TargetAgent ); } private static readonly HashSet<string> KnownCategories = new() { "bug", "feature", "chore", "spike", "incident" }; private const double MinConfidenceThreshold = 0.6;}
A HashSet may be questionable compared with another type like Enum, but that besides the point. The validator catches two failure modes. First, structural violations where the triage agent returns a category the scoring model does not recognize. Second, confidence violations where the agent classified the work item but with low confidence. Low confidence means the classification should route to a human instead of flowing automatically to ranking.
The Boundary Middleware
The handoff code changes from a direct call to a governed crossing.
public class BoundaryGate<T>{ private readonly IBoundaryContract<T> _contract; private readonly IBoundaryLog _log; public BoundaryGate(IBoundaryContract<T> contract, IBoundaryLog log) { _contract = contract; _log = log; } public async Task<BoundaryResult<T>> CrossAsync(T payload) { var result = _contract.Validate(payload); await _log.RecordCrossingAsync(new BoundaryCrossing { SourceAgent = result.SourceAgent, TargetAgent = result.TargetAgent, Timestamp = result.Timestamp, IsValid = result.IsValid, Violations = result.Violations, PayloadHash = ComputeHash(payload) }); return result; }}
The calling code now looks like this.
var classification = await triageAgent.InvokeAsync(workItem);var gate = new BoundaryGate<WorkItemClassification>( new TriageToRankingContract(), boundaryLog);var crossing = await gate.CrossAsync(classification);if (!crossing.IsValid){ await humanReviewQueue.EnqueueAsync(workItem, crossing.Violations); return;}await db.WorkItems.UpdateClassification(workItem.Id, crossing.Payload);var ranked = await rankingAgent.InvokeAsync(workItem);
Invalid crossings route to a human review queue instead of silently propagating. The ranking agent never sees input that failed the boundary contract. Ring 1, constrain inputs, is now structural at the handoff.
The payload’s hash as Identity for the payload was distracting because I worried about uniqueness, but its besides the point.
There is so much to think about here, but even this is better than nothing.
The Boundary Log in PostgreSQL
Every crossing is recorded. Valid and invalid. The log serves two purposes: operational debugging and governance audit.
CREATE TABLE boundary_crossings ( id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, source_agent TEXT NOT NULL, target_agent TEXT NOT NULL, crossed_at TIMESTAMPTZ NOT NULL DEFAULT now(), is_valid BOOLEAN NOT NULL, violations TEXT[], payload_hash TEXT NOT NULL, session_id UUID);CREATE INDEX ix_crossings_agents ON boundary_crossings (source_agent, target_agent, crossed_at);CREATE INDEX ix_crossings_invalid ON boundary_crossings (crossed_at) WHERE NOT is_valid;
The payload_hash avoids storing raw payloads in the log while preserving traceability. The partial index on invalid crossings makes it cheap to query failure patterns. A retention policy keeps six months of data, which aligns with the audit log requirements from The EU Says You Need a Kill Switch by August.
The Boundary Dashboard in Vue.js
A governance system that only engineers can read is not governance. It is logging. The Vue.js dashboard surfaces boundary health to anyone who needs to see it.
+---------------------------------------------------+| Agent Boundary Health |+------------------+----------+---------+-----------+| Boundary | Last 24h | Invalid | Rate |+------------------+----------+---------+-----------+| triage > ranking | 847 | 12 | 1.4% || ranking > summary| 835 | 3 | 0.4% || extract > triage | 214 | 31 | 14.5% || summary > daily | 412 | 0 | 0.0% |+------------------+----------+---------+-----------+
Look at that extract-to-triage boundary. 14.5% failure rate. That means the extraction agent is producing work items that triage cannot classify within its known categories. Without boundary governance, those items would flow silently through the system and produce meaningless scores. With boundary governance, they route to human review and the failure rate is visible.
The dashboard queries the boundary_crossings table through a simple API endpoint. No new infrastructure. PostgreSQL, a .NET API, and a Vue.js component.
The Boundary Map
The system knows its own topology because every boundary contract declares its source and target agent. The map is derived from the contracts, not maintained separately.

When a new agent is added to the system, it connects through boundary contracts. The map updates automatically because the contract declares the relationship. The topology is a property of the code, not a diagram someone maintains.
ML.NET at the Boundary
The ranking agent uses an ML.NET model to score work items. The model is deterministic. Its inputs are not. The boundary contract between triage and ranking protects the model from stochastic drift by rejecting inputs the model was not trained to handle.
public class RankingModelBoundary : IBoundaryContract<ScoringInput>{ public string SourceAgent => "ranking-agent"; public string TargetAgent => "ml-scoring-model"; public BoundaryResult<ScoringInput> Validate(ScoringInput payload) { var violations = new List<string>(); if (!TrainedCategories.Contains(payload.Category)) violations.Add( $"Category '{payload.Category}' not in training set. " + "Model output will be unreliable."); if (payload.FeatureVector.Any(float.IsNaN)) violations.Add("Feature vector contains NaN values."); return new BoundaryResult<ScoringInput>( violations.Count == 0, payload, violations.ToArray(), DateTimeOffset.UtcNow, SourceAgent, TargetAgent); }}
This is Ring 1 applied to a deterministic component. The ML.NET model does not need containment in the stochastic sense. It needs input validation that accounts for the stochastic source of its inputs. The boundary contract is where that validation lives.
—
Stories from Production
Five Agents, Four Boundaries, Zero Rings (Framework Vision)
A .NET platform runs five agents built with Microsoft Agent Framework. Triage, ranking, summarization, extraction, and a daily control task orchestrator. Each agent was built with scoped tools, clear prompts, and individual test coverage. The team follows agentic engineering practices. Every agent is well-governed internally.
After three months, the team notices the daily control task occasionally surfaces work items with nonsensical priority scores. The ranking model scored a work item at 0.97 priority, but the item was a routine documentation update. Investigation reveals that the triage agent classified it as an “incident” with 0.52 confidence. The classification was wrong but above the implicit threshold of “the model returned something.” The ranking model scored it as a high-priority incident because that is what the classification said.
The fix takes ten minutes. A boundary contract between triage and ranking that rejects classifications below 0.6 confidence and routes them to human review. The investigation to find the root cause took three days because nothing in the system flagged the boundary as the failure point. Every agent completed successfully. The logs showed normal operations. The failure was invisible because it lived between agents, not inside them.
The team adds boundary contracts to all four handoffs in one sprint. The extract-to-triage boundary immediately reveals that 14% of extracted work items cannot be classified. That failure rate was invisible before. Those items had been silently flowing through the system producing low-confidence classifications that the ranking model consumed without question. (Framework Vision)
The Boundary That Caught a Model Drift (Framework Vision)
Six months after deploying boundary contracts, the triage-to-ranking boundary failure rate increases from 1.4% to 8.2% over two weeks. The dashboard surfaces the trend. The violations are all the same: “Unknown category ‘enhancement.'”
The triage agent’s upstream model was updated. The new version learned a category the scoring model was never trained on. Without the boundary contract, every “enhancement” classification would flow to the ranking model and receive a meaningless score. With the contract, every “enhancement” routes to human review and the failure rate spike is visible on the dashboard the day it starts.
The fix is straightforward. Retrain the ML.NET scoring model to include the “enhancement” category, then update the boundary contract’s known categories list. The boundary caught a model drift that would have silently degraded output quality for weeks. (Framework Vision)
When the Boundary Itself Is Wrong
Boundaries will have bugs. A contract that rejects a valid category or sets a confidence threshold too high will route good work items to human review. At low volume that is a nuisance. At hundreds of crossings per hour it is a bottleneck that looks like a system failure.
Recovery speed matters more than prevention here. The crossing log records every rejection with the violation reason and payload hash. When you discover a contract was wrong, you query the log for every item that hit that specific violation, fix the contract, and replay them through the updated gate. The human review queue still holds the items because they were routed, not dropped.
The pattern this post describes does not include an automated replay mechanism. That is deliberate. Replay is a recovery operation that should be explicit, auditable, and triggered by a human who understands what the contract change means. But the log makes it possible. Without the log, a bad boundary contract means lost work. With the log, it means delayed work. Time to resolve is the metric that separates a governance system from a governance obstacle.
—
The pattern is old. Contract, validate, route, log. Enterprise middleware has done this for decades. What is new is the threat model. Deterministic systems needed schema validation. Stochastic systems need domain validation and confidence gating because the agent can produce structurally perfect output that is semantically wrong. The plumbing is familiar. The reason you need it is not.
Agent sprawl governance in .NET is not a framework feature. It is the same boundary pattern, extended for stochastic handoffs. The code is C#. The storage is PostgreSQL. The visibility is Vue.js. The principle is the same one from the main series: the unit of governance is the boundary, not the agent.
Let’s talk about it.
Agent Sprawl Is the New Shadow IT.
A friend sent me a post about AI sprawl across enterprise tooling. The argument was that organizations are paying for the same value many times over. Across an organization some people summarize emails in Outlook, sales team summarizes same email in the CRM, PMO summarizes the email in the project management tool. Three subscriptions, three vendors, three different models delivering the same value for the same organization on the same input. The potential for waste is real.
I want to be straight with you. I think the sprawl is a maturity problem, not a design problem. Everyone is trying to find leverage with AI right now. Teams are experimenting. Departments are buying tools that solve immediate pain. Nobody coordinated because nobody knew what would work six months ago. That is not negligence. That is what early adoption looks like in every technology wave. As things settle and consolidate, the duplicate subscriptions will compress. The market is already moving that way.
But here is the part that will not consolidate on its own. Even after the vendor landscape settles, even after the organization standardizes on fewer tools, the duplicate capability problem persists at the agent level. Three different workflows that each classify a work item using three different prompts with three different confidence thresholds. Five agents that each summarize context in slightly different ways because five operators made five independent decisions about what “summarize” means. The tools consolidate. The capabilities inside them do not, because nobody governed the boundary where one agent’s output becomes another agent’s input.
Gartner predicts 40% of enterprise applications will feature task-specific AI agents by the end of 2026. For the average organization, that translates to 50 or more specialized agents. Customer service agents. Code generation agents. Data pipeline agents. Document processing agents. Scheduling agents. Each one deployed by a different team, with different tooling, different containment posture, and different governance assumptions. They Can Watch. They Cannot Stop. showed what happens when organizations skip the containment rings for a single class of agent. Now multiply the problem.
69% of organizations suspect their employees already use prohibited AI tools. The agents are not waiting for an enterprise rollout. They are arriving through the same channel that every previous wave of unauthorized technology used: individual teams solving immediate problems without waiting for centralized approval.
History Repeats
The enterprise technology industry has seen this before. In 2018, Robotic Process Automation promised to automate repetitive tasks without changing underlying systems. Adoption was fast. Individual departments built bots to handle invoice processing, data entry, report generation. The bots worked. The ROI was immediate and visible. Within two years, large organizations had hundreds of RPA bots running across dozens of departments with no central inventory, no shared governance, and no unified monitoring.
Unframe AI drew the comparison directly: decentralized adoption, quick wins, proliferation, fragmentation, expensive consolidation. The RPA consolidation crisis cost organizations millions and took years. Bots broke when underlying systems changed. Nobody knew which bots existed, what they accessed, or who was responsible for maintaining them. The technical debt was invisible until it was not, and by then the cleanup was more expensive than the original implementation.
Agent sprawl follows the same trajectory but compresses the timeline. RPA bots were deterministic. They did exactly what they were scripted to do, which made them fragile but predictable. AI agents are stochastic. They interpret instructions, make decisions, and adapt to context. A misbehaving RPA bot runs the wrong script. A misbehaving AI agent improvises. The blast radius per agent is larger, the number of agents is growing faster, and the governance infrastructure is thinner.
Organizational Cognitive Debt
Your Code Works. Nobody Knows Why. described cognitive debt as the gap between a system’s structure and a team’s understanding of that structure. Agent sprawl creates cognitive debt at the organizational level. When fifty agents operate across an enterprise, the question is not whether any individual agent is governed. The question is whether anyone can describe the complete system of agents, their interactions, their data flows, and their combined effect on the business.
Most organizations cannot. The agents were deployed independently. The customer service team chose one vendor. The engineering team built their own. The finance team embedded agents into existing SaaS tools. Each team can describe their own agents. Nobody can describe the whole. And nobody knows what happens when these agents interact. When the customer service agent updates a record, and the data pipeline agent processes that record, and the reporting agent summarizes the result, the combined behavior is an emergent property of three independent systems that were never designed to work together.
This is shadow IT at the capability level. Traditional shadow IT was about unauthorized applications. Agent sprawl is about unauthorized capabilities. An employee does not install a new application. They enable an AI feature inside an application the organization already approved. The application is sanctioned. The agent capability within it is not. The IT asset inventory shows zero unauthorized tools. The actual environment contains agents that nobody is tracking.
The Unit of Governance Is Not the Agent
CIO magazine identified three pillars for taming agent sprawl: orchestration, governance, and observability. These are correct as categories. The implementation question is where those pillars sit. If orchestration, governance, and observability are built per-agent, the organization has a governed collection of individual agents. If they are built per-boundary, the organization has a governed system.
The distinction matters because agent-level governance does not compose. Ten individually governed agents are not a governed system. They are ten systems that happen to share an organization. The interactions between agents, the data that flows from one to another, the cumulative effect of their combined actions on business processes, none of this is captured by governing each agent in isolation.
How Agents Stay in Bounds defined containment at the boundary, not the agent. Ring 1 scopes the inputs an agent receives. Ring 2 isolates the environment it runs in. Ring 3 validates its outputs. Ring 4 gates its promotion. These rings apply at every boundary in a multi-agent system, not just the boundary around each individual agent. The handoff from one agent to another is a boundary. The data flow between agent-enabled applications is a boundary. The integration point where an agent’s output becomes another agent’s input is a boundary.

Governing the boundaries means that even when a new agent appears in the ecosystem, its interactions with existing agents are already constrained. The new agent’s output passes through a boundary ring before it becomes input for another agent. The organization does not need to re-govern the entire system every time someone deploys a new agent. The boundaries hold.
The Inventory Problem
Before you can govern agents at the boundary, you need to know the boundaries exist. This is the discovery problem, and it is harder than it sounds. A 2026 Deloitte analysis of the multi-agent market estimated the agentic AI orchestration market will reach $35 billion by 2030. That money is not going to centralized platforms. It is being distributed across vendors, internal tools, and embedded capabilities in existing software.
The first step is an inventory. Not an inventory of agents, because agents are embedded in applications and invisible to traditional asset management. An inventory of capabilities. Which applications in the environment have AI agent features enabled? Which of those features can take autonomous action? Which can access data from other systems? Which can modify records, send communications, or trigger workflows?
This is the same audit structure from They Can Watch. They Cannot Stop., extended from individual agents to the organizational ecosystem. The four questions are the same. Can you define and enforce what each agent receives as input? Can you isolate each agent’s execution environment? Can you validate each agent’s output against measurable criteria? Can you prevent each agent from promoting its actions without approval? Apply those questions at the boundary between agents and you have a multi-agent governance audit.
- Enumerate every application in the environment with AI agent capabilities, including embedded copilots and SaaS features.
- For each, identify whether the agent can take autonomous action, access data beyond its primary function, or trigger downstream processes.
- Map the data flows between agent-enabled applications. Every flow is a boundary.
- Apply the four-ring audit to each boundary. Can you scope the input at the handoff? Can you isolate the execution? Can you validate the output? Can you gate the promotion?
- Score each boundary as governed, partial, or ungoverned. The ungoverned boundaries are your risk surface.
Organizations that run this audit typically discover more boundaries than they expected. The agent count may be manageable. The boundary count is where the governance gap hides.
RPA’s Lesson
Unframe AI’s comparison to RPA includes an observation that applies directly: “Agents aren’t the unit of value. Outcomes are.” The organizations that survived the RPA consolidation crisis were the ones that shifted from managing individual bots to managing business outcomes that bots contributed to. They built centralized orchestration, unified governance, and shared monitoring. They treated the collection of bots as a system rather than a portfolio of independent tools.
The agent version of this lesson is the same. The organization that governs fifty agents as fifty individual tools will face the same consolidation crisis RPA created. The organization that governs fifty agents as a system with defined boundaries, scoped handoffs, and unified observation will not. The difference is not the number of agents. It is whether the governance model scales with the agent count or collapses under it.
Agentic Engineering Is a Practice. AgenticOps Is the Infrastructure. made this argument for individual developers: practice degrades under pressure, infrastructure holds. The argument is identical at organizational scale. An organization that relies on each team to govern their own agents will see governance degrade as deployment velocity increases. An organization that builds governance into the boundaries between agents will see governance hold regardless of how many agents individual teams deploy.
The agents are already here. The sprawl has already started. The RPA playbook says the consolidation crisis arrives in 18 to 24 months. The question is whether organizations build the boundaries now or pay for the cleanup later.
Let’s talk about it.