Tagged: AI

Autonomy Without Infrastructure Is Just a Demo

The AgenticOps series defines six layers, four containment rings, and a maturity model. All of it was framework vision. The AgenticOps Applied series are stories about how the vison is realized through experiments and production case studies. This post is a case study that tests the framework against a production system that was built without the it.

What Stripe Published

Stripe released two blog posts in early 2026 describing their internal coding agents, called Minions (Part 1 and Part 2). The numbers are striking. Over 1,300 merged pull requests per week. Every PR is human-reviewed. None contains human-written code.

Stripe didn’t build Minions from a governance framework. They built them from engineering first principles to solve a production problem. Autonomous coding agents at scale inside a system that processes payments.

The architecture they arrived at is worth examining. Not because it validates AgenticOps by name, but because independent convergence on the same structural patterns is stronger evidence than any single implementation built from the framework itself.

What They Built

Five components define the Minions architecture.

Devboxes. Every agent run executes in a disposable AWS EC2 instance. These environments arrive pre-warmed with the full codebase, built dependencies, and running services in about ten seconds. No internet access. No production connectivity. Destroyed after each run. Stripe already used devboxes for human engineers. The same infrastructure worked for agents.

Blueprints. Minion runs are not pure agent loops. They are hybrid state machines that interleave deterministic nodes with stochastic agent nodes. Deterministic steps handle linting, pushing branches, and triggering CI. Agent steps handle implementation and failure resolution. The agent gets freedom where reasoning helps. The system enforces what must always happen.

Toolshed. An internal MCP server with nearly 500 tools for internal systems and SaaS platforms. Agents receive curated subsets, not the full set. Security controls prevent destructive actions. Before a run begins, the system fetches context from tickets and documentation so agents start informed rather than searching blind.

Rule files. Static guidance scoped to directories. As the agent traverses the codebase, relevant rules load automatically. Stripe standardized on Cursor’s format and syncs rules to support Claude Code as well. Global rules fill the context window. Scoped rules provide signal where the agent is actually working.

Verification pipeline. Local lint runs in under five seconds after generation. Only after that passes does the system target CI against a suite of over three million tests (WTF). If CI fails, the agent gets one retry. Not infinite retries. One. Then the PR goes to a human. Stripe caps iterations because compute, tokens, and time cost money.

Alignment to the Containment Rings

Post 4 of the main series introduced four rings. Here is where Stripe’s architecture maps.

RingWhat It RequiresWhat Stripe Built
1: Constrain InputsCurated tool access, scoped contextToolshed (curated MCP subsets), directory-scoped rule files, pre-hydrated context
2: Constrain EnvironmentIsolated, disposable executionDevboxes (pre-warmed EC2, no internet, destroyed after use)
3: Validate OutputsLayered verificationLocal lint (seconds) + selective CI (minutes) + capped retry (one attempt)
4: Gate PromotionHuman review as structural gateEvery PR goes to a human reviewer, agents never self-merge

All four rings are present.

Ring 2 is the strongest. Devboxes provide binary isolation. The agent either cannot reach production, or the ring does not exist. There is no partial isolation. Stripe chose infrastructure over policy.

Ring 1 is more sophisticated than most implementations. Toolshed is not just tool access. It is curated, scoped, and security-controlled tool access. The distinction matters. Giving an agent 500 tools is not Ring 1. Giving it the 12 tools relevant to its task is.

Ring 3 includes a design decision that reveals operational maturity. Capping retries at one is an economic constraint, not a technical one. Infinite retries would burn tokens and compute chasing diminishing returns. The cap forces failed tasks back to humans rather than letting agents loop.

Ring 4 is non-negotiable at Stripe. Agent-generated code never merges itself. This is the same principle from the main series: governance sits outside the agent loop, not inside it.

Alignment to the Six Layers

The six layers tell a different story. Stripe covers some well and skips others entirely.

LayerStripe CoverageEvidence
IntentPartialTasks arrive from Slack, CLI, web UIs. No formal contract space, invariants, or state machines.
Agent GenerationStrongBlueprints, devboxes, Toolshed. Agents generate inside explicit boundaries.
EvaluationStrongLint + CI + capped iteration. Layered and cost-aware.
PromotionStrongHuman PR review. No self-promotion.
Runtime GovernanceNot describedBlog posts focus on agent infrastructure, not post-deployment observability of generated code.
Knowledge CompressionNot describedMinions produce PRs. No mention of compressed artifacts, invariant updates, or system documentation as output.

The bottom four layers (Generation through Promotion) are well-built. The top and bottom layers (Intent and Knowledge Compression) are absent or informal.

This is not a criticism. Stripe solved the problem they had. But the gap is structurally interesting. Maybe intent isn’t mentioned because tasks are small and well-scoped. Maybe knowledge compression is absent because Stripe’s existing engineering culture handles documentation through other channels.

The AgenticOps model predicts that these layers become necessary at higher maturity levels. Stripe may not need them yet. Or they may have them and the blog posts simply didn’t cover them.

Maturity Assessment

Post 3 of the main series defined six maturity levels. Here is where Stripe sits.

Level 0, manual coding. Humans write and review everything. Stripe is past this.

Level 1, AI-assisted coding. AI generates, humans review line by line. Stripe is past this. Minions are not copilots. They are autonomous agents that produce complete pull requests.

Level 2, contract-first generation. Humans define contracts. AI implements against them. Tests gate promotion. Stripe partially meets this. Tests gate promotion, and rule files define constraints. But there is no formal contract space in the AgenticOps sense. No versioned invariants, no state machine definitions, no explicit risk tolerance declarations. The contracts are implicit in the test suite and rule files rather than formalized as a separate layer.

Level 3, governed agent loops. Slice queues, evaluation services, approval gates, containment enforced structurally. This is where Stripe lives. Blueprints are governed loops. Devboxes are structural containment. Human review is an approval gate. The governance is built into the system, not a process someone follows.

Level 4, observational governance. Runtime telemetry feeds back into planning and constraint refinement. Stripe tracks metrics on Minion performance, success rates, and merge rates. They iterate on blueprints and rules based on results. But the blog posts do not describe an automated feedback loop from runtime telemetry to constraint refinement. There are indicators of L4 thinking without the closed loop.

Level 5, adaptive governance. The system proposes constraint improvements within defined boundaries. Not described.

Stripe is solid Level 3 with early Level 4 signals. I bet that places them ahead of most organizations. Post 3 noted that most teams are between Level 1 and Level 2. Stripe jumped past the painful middle by investing in infrastructure rather than trying to scale human review.

What’s Not There

Three things the AgenticOps model calls for that Stripe’s published architecture does not describe.

Formalized intent. Tasks arrive as natural language requests through Slack or internal tools. There is no versioned contract space, no invariant classification, no explicit risk tolerance. In the next post I argued that intent rots without versioning. Stripe’s tasks are small enough that intent rot may not be a factor. At 1,300 PRs per week, the blast radius of any single task is small by design.

Knowledge compression. Minions produce code changes. The blog posts do not describe any system for producing compressed artifacts, updated documentation, invariant lists, or system summaries as a byproduct of agent work. In a future post I will also argued that compression without tiers is spam. Stripe may have solved this through other channels, or they may not need it at the task granularity Minions operate at.

The feedback loop. 1 argued that the six-layer diagram should be a cycle, not a waterfall. Knowledge compression feeds back into intent refinement. Stripe’s system appears linear: task in, PR out. The blog posts do not describe runtime signals feeding back into blueprint design or rule file updates, though Stripe almost certainly does this manually through engineering iteration.

None of these are failures. They are observations about where the model extends beyond what Stripe published. The interesting question is whether these gaps constrain Stripe’s ability to reach Level 4 and Level 5, or whether their task granularity makes the gaps irrelevant. Maybe they are past 4 and 5 and found gear 6.

What Convergence Means

Stripe did not read the AgenticOps posts. They did not reference containment rings. They solved an engineering problem and arrived at a structurally similar architecture.

The mapping nomenclature is mine, not theirs.

When independent teams approach the same class of problem from different starting points and still land on the same structural solutions, it usually means the problem space itself is constraining the design. The architecture isn’t ideology. It’s physics.

In this case the physics is stochastic software generation.

This is the first post in this series and it shows the Framework Applied rather than Framework Vision. The underlying principles are real, published, and operating at scale. The alignment to the containment model is analytical, not claimed by Stripe.

The containment rings hold. The maturity model places Stripe where the evidence suggests. The layers that Stripe skips are the ones the model predicts become necessary later.

Will it hold? Is it wrong?

Let’s talk about it.

Next: [Intent Drifts. Then Everything Drifts.]

I Was a 1x Coder at Best. AI Made Me a 0x Coder.

Over four posts I built an argument. Total understanding is a myth. Cheap generation without governance creates invisible debt. AgenticOps is the discipline layer. Containment is the mechanism.

All of that was structural. This one is personal.

I Taught Myself to Code

I don’t have a computer science degree. I don’t have a software engineering degree. I have no formal training in the thing I’ve done for a living for well over two decades.

I learned from books. Then from Google. Then from StackOverflow. I learned from copying patterns I saw in codebases I didn’t fully understand. Eventually from building things that broke and figuring out why.

The learning never felt complete. It still doesn’t, and now it feels like I have so much more to learn.

I have OCD, ADD, depression, and imposter syndrome. The OCD means I fixate on problems until they resolve. The ADD means I struggle to focus long enough to resolve them efficiently. The depression and imposter syndrome make me doubt everything I do. Those forces fight each other constantly. Sometimes that tension produces good work. Sometimes it produces hours lost chasing details that didn’t matter.

On top of that I never felt like an engineer. The people I admired seemed to hold so much of the systems we worked on in their heads, reason about concurrency without breaking a sweat, debug memory and network issues by reading traces. They seemed to operate in a different register, a different dimension.

I watched conference talks and understood maybe half of what was said. I read papers and got the gist but not the math. I built mental models that were close enough to be useful but never precise enough to feel confident.

The 10x developer myth lived in my head. Not because I believed it literally, but because I measured myself against it. If they were 10x, I was 1x. Maybe. On a good day.

Yet, I ended up as a top producer or leader on all the teams I worked on, so I had some value, even if my brain doesn’t believe it.

I Spent Years Closing a Gap That Didn’t Matter

I tried to get faster. Better tooling, better shortcuts, better frameworks. I optimized my workflow to muscle memory. Split terminals, keyboard shortcuts, IDE configurations I’d tuned over years.

I got good enough. I shipped systems that handled real traffic, real money, real consequences. Payment services processing billions of dollars per month where a bug meant many people didn’t get paid. Multi-tenant platforms where a data leak meant one company could see another company’s information.

But I never shook the feeling that the real engineers were operating at a level I’d never reach. That the gap between us was fundamental, not experiential.

So I kept grinding. More books. More side projects. More late nights trying to understand things other people seemed to just know.

The gap I was trying to close was implementation speed. How fast can I translate intent into working code? How quickly can I go from “this is what we need” to “this is what exists”?

I was optimizing for the wrong variable the entire time.

AI Made Me a 0x Coder

Then in October to November of 2025 it felt like AI arrived. Not the theoretical AGI kind. The real kind that writes code.

I started using AI agents to build systems. Not as a helper. Not as autocomplete. As the implementation layer.

Today I write zero lines of code by hand. Zero.

AI scaffolds services. AI implements business logic. AI writes tests. AI refactors modules. AI generates migrations. I define what needs to exist, what constraints it must satisfy, what acceptance criteria must be met, and I evaluate that they are met. The agent does the rest.

I code 0x.

The skill I spent twenty years building, the ability to translate intent into syntax, is fully delegated. The keystrokes I optimized. The frameworks I memorized. The patterns I drilled into muscle memory. All bypassed.

It still feels like a loss. A waste. The thing I’d spent my career trying to master was now something a machine does better and faster. A 1x coder didn’t become 10x. I became 0x.

0x Is Not a Deficit

Here’s what I didn’t expect. Letting go of implementation didn’t reduce my output. It multiplied it.

AI doesn’t just write code faster and better than me. It writes at a scale I could never match. Full service scaffolding in minutes. Test suites covering edge cases I would have missed. Rewrites and refactors across modules that would have taken me days.

I was never going to write at 100x. But I can govern at 100x.

In the first post I said I scale containment, not understanding. I wrote that before I’d lived it. Now I have.

In the second post I argued the hard parts were never typing. In October 2025 I meant it theoretically. Today, I mean it literally. I don’t type production code. The hard parts, the constraint decisions, the system boundaries, the verification criteria, those are the only parts I do.

The six layers from the third post. Intent, agent generation, evaluation, promotion, runtime governance, knowledge compression. Those are not a framework I designed in the abstract. They are the operating system I iterate on because I had to. Because governing agent output is the only way 0x works.

The four rings from the fourth post. Constrain inputs, constrain environment, validate outputs, gate promotion. Those are not best practices on a slide. They are the walls of the house I am building to live in. Without them, 0x is reckless. With them, 0x is operational.

What a Day Looks Like for a 0x Coder

Here is the 0x workflow in practice.

  1. Define intent. What value does this slice deliver? What state transitions does it manage? What must never break?
  2. Define contracts. Input schemas, output schemas, interface definitions, invariant list.
  3. Define tests. Contract tests, integration tests, edge case scenarios. The tests exist before the implementation does.
  4. Scope the agent. Mount the contracts, tests, and bounded context into the agent’s workspace. Nothing else.
  5. Generate. The agent plans, scaffolds, implements, and refactors inside its scope.
  6. Evaluate. The evaluation pipeline runs automatically. Contract tests, static analysis, security scanning, schema validation.
  7. Review outcomes. I don’t read generated code line by line. I review whether behavior matches intent. Test results, API output diffs, invariant checks.
  8. Approve or reject. If the evidence says it works, promote. If not, refine the constraints and loop.

That loop is my job. I don’t write code. I write constraints and review evidence. I don’t delegate my responsibility to deliver value.

There are skills, besides coding, that I spent twenty years building that still matter, but not the way I expected.

I understand systems well enough to define intent for them. I understand failure modes well enough to write meaningful constraints and acceptance criteria. I understand architecture well enough to scope agents tightly. I understand system risks well enough to judge when evidence is sufficient.

Understanding code well enough to evaluate it is different from writing it. Both are valid. Evaluation may matter more now.

The Skills That Actually Compound

I worried about the wrong things for twenty years.

I thought typing speed mattered. What compounds is system design.

I thought language mastery mattered. What compounds is constraint definition.

I thought memorizing APIs mattered. What compounds is evaluating outcomes.

I thought writing code from scratch mattered. What compounds is judging output.

I thought line-by-line review mattered. What compounds is evidence-based verification.

I thought understanding every line of code mattered. What compounds is understanding boundaries.

The first list is what I optimized for as a 1x coder. The second is what I actually use as a 0x one.

Every one of those skills in the second list was something I was already doing alongside the coding. Designing systems, defining boundaries, testing behavior, evaluating risk. I just didn’t recognize them as the primary skills because the coding felt like the real work.

It never was.

The Code Was Never the Value

This is the part I had to live to believe.

I spent twenty years measuring myself by my ability to produce code. When AI took that ability away, it felt like losing the foundation of my career. Will I miss the act of coding? Yes. But more than that, I worried that without it, I had no value to add.

But the foundation was never the code. The foundation was the ability to solve problems and deliver value. My value add was being able to understand systems, judge outcomes, define constraints, and make decisions under uncertainty. Writing a for loop was not where the value lived. The code was always an artifact of those decisions. Not the decisions themselves.

The payments service worked because I understood the state transitions, not because I typed the implementation. The multi-tenant platform was secure because I understood the isolation requirements, not because I wrote the permission layer by hand.

In the first post I said my decisions are what matter. I believe that more now than when I originally wrote it. I’ve spent time producing real systems without writing a single line of code, and the outcomes are the same or better than what I produced when I typed everything myself.

Not because I’m better or AI is better. Because the division of labor is optimized. Humans are good at intent, constraints, judgment, and risk assessment. Agents are good at implementation, coverage, consistency, and speed. Combining them beats either one alone.

Coding and Building Are the Same Thing Again

Grace Hopper spent her career trying to get away from code. Trying to move programming toward natural language. Uncle Bob Martin called our continued use of the word “code” a reflection of our failure to meet her goal.

I think we’re close to meeting it now. Not because prompts are natural language. Because the distinction between “writing code” and “building systems” is dissolving.

For decades, building software required coding. You couldn’t build without typing in some weird cryptic syntax. The skills overlapped so completely that we treated them as the same thing.

They aren’t. Building is intent, constraints, architecture, verification, judgment. Coding is translating those into syntax. When AI handles the translation, building remains.

The distinction was always there. We just couldn’t see it because the two were inseparable. Now they’re separated. And it turns out the building side is where the value was all along. I am a system builder and an AI agent operator.

Five Posts. One Thesis.

I’ve never fully understood the systems I work in. AI made that worse, but containment made it manageable.

Most software is CRUD molded into value. Cheap generation without governance creates invisible debt, but constraint discipline prevents it.

AgenticOps is the governance model. Six layers. Four rings of containment. A hard line between what agents generate and what systems execute.

The human’s role didn’t shrink. It moved. From implementation to intent. From typing to judgment. From code review to evidence review.

And this last part is the one I had to live to believe. The code was never the value. The decisions were. A 0x coder governing a 100x agent produces better outcomes than a 1x coder typing everything by hand.

I know because I’m the 0x coder. And I believe the systems I’m building now are as good or better than the systems I hand coded. What’s your experience?

Let’s talk about it.

Previous: [How Agents Stay in Bounds]

Next: [You Can Build This. Three Artifacts and a Sandbox.]

Most Software Is Just CRUD. That’s Not the Problem.

I spent my career in startups, enterprises, and small boutique consultancies. And if I’m being honest about most of the systems I’ve worked on, they were over-complicated CRUD machines.

Different domains. Different UIs. Different industries. But underneath? Create, read, update, delete. From the UI to the API to the database, we molded CRUD into something usable, something valuable.

We wrapped business rules around it. We added workflows, enforced permissions, tracked state transitions, sprinkled in some complex algorithms where needed. But the core of what most systems do? They move data around.

That doesn’t make these systems trivial. It makes them structured. State machines, permission layers, data mutation rules, integration plumbing. And structured domains are exactly the kind of thing that’s automation-friendly.

That’s why AI is both dangerous and powerful at the same time.

The Word “Code” Is 80 Years Old. So Is the Problem.

In my last post I mentioned Uncle Bob Martin’s observation about Grace Hopper and the origin of the word “code.” It’s worth sitting with for a minute.

When Hopper and her team programmed the Harvard Mark I, “code” meant the numbers they wrote on paper. Numbers representing hole positions on 24-bit paper tape. That was the program.

Hopper spent the years after that trying to get away from code entirely, trying to move programming toward natural language. She built some of the earliest compilers to do it.

Eighty years later, we still call our programs “code.” Every step up the abstraction ladder, from hole positions to assembler to Fortran to C to managed runtimes to cloud abstractions, we kept calling it code.

The people closer to the metal always complained that the higher level didn’t understand what was really happening. And they were right, at a certain level. But that was always the point.

We don’t punch cards anymore. We don’t read assembly to ship a CRUD app. We don’t manage memory for every request lifecycle.

Each time we moved up, we traded low-level visibility for leverage. The people who adapted operated at a different level entirely. The people who clung to the lower layer complained about the inadequacies of the higher one.

Now the abstraction layer is rising again. But this time, the nature of the shift is different.

This Abstraction Step Isn’t Like the Others

Every previous step up the abstraction ladder was deterministic. C compiled to assembly the same way every time. Managed runtimes handled memory according to defined algorithms.

Cloud abstractions mapped to infrastructure through predictable configurations. You could trace the path from the higher level to the lower level. The mapping was knowable.

AI-generated code doesn’t work like that. It’s stochastic. Ask it to scaffold a service and you’ll get something reasonable, something that works, but it’s sampled, not compiled. Run it again and you might get a different implementation. The output sits in a probability space, not a deterministic one.

For most CRUD scaffolding, this doesn’t matter much. The solution space is narrow enough that the probabilistic output is reliably close to what a deterministic process would produce.

Wiring up a DTO, implementing a repository pattern, generating a migration. These tasks are constrained enough that AI’s stochastic nature is practically invisible.

But when AI starts reasoning through edge cases, inferring business intent, or making architectural choices, the stochastic nature matters a lot. The danger is the mismatch: probabilistic reasoning producing artifacts that systems treat as deterministic truth.

A contract, a migration, a security boundary. Once it exists, the system executes it as fact. It doesn’t know or care that it was generated by a process that could have gone differently.

That’s the new risk that didn’t exist at any previous level of the abstraction ladder.

The Danger Isn’t That CRUD Is Simple. The Danger Is That CRUD Becomes Cheap.

This is the part I don’t see enough people talking about.

When CRUD becomes nearly free to produce, more systems get built. More features get added. More integrations get stitched together. More surface area exists than anyone can reason about.

The cost per unit of implementation drops toward zero, but the governance cost per unit doesn’t. Anything that becomes cheap gets overproduced. That’s not a software principle, it’s an economic one.

Without constraint discipline, we won’t get better systems. We’ll get more of them. Layered, duplicated, loosely governed, and fragile. The implementation volume explodes but the system intent stays murky. And now the implementation is stochastic on top of it.

That’s invisible complexity debt. And it compounds.

Humans Love Proving AI Is Wrong

I see it constantly. Humans reveling in AI getting things wrong.

“See? It misunderstood the intent.” “See? It missed an edge case.” “See? It hallucinated.”

There’s almost a celebration every time someone can prove that humans are still necessary in the SDLC. I get it. But it’s a weak position. It’s defensive. It’s arguing that our value is in catching mistakes in 300 lines of generated code.

The mistakes they’re catching are stochastic outputs that slipped through without verification. The solution isn’t to celebrate catching them. The solution is to build systems where they get caught before they matter.

Humans are becoming the bottleneck in raw code production. Not because we’re irrelevant, but because we’re slower.

An AI can produce hundreds of lines in seconds. It can scaffold services, wire up DTOs, implement repository patterns, generate migrations, create test suites. A human doing that line by line is objectively slower.

Just like punching cards was slower. Just like writing assembly was slower. Just like manually allocating memory everywhere was slower.

We abstracted those layers away. Now we’re abstracting away bulk implementation.

The Hard Parts Were Never Typing

This doesn’t make software development easier. If anything it gets harder. Because the hard parts were never typing.

Consider two examples.

A payments service needs to decide what happens when a refund is requested after a partial chargeback has already been applied. AI can generate the refund endpoint in seconds. It cannot decide whether the business eats the overlap, rejects the refund, or caps it at the remaining amount. That’s a constraint decision.

A multi-tenant system needs to determine its isolation boundary. AI can scaffold either a shared-database or database-per-tenant architecture in minutes. It cannot decide which one is right. That depends on compliance requirements, cost structure, and what the business can tolerate if a tenant’s data leaks into another tenant’s view.

AI can generate CRUD scaffolding all day long. It cannot make these kinds of decisions. And that responsibility doesn’t shrink as abstraction rises. It intensifies, especially when the abstraction layer below you is probabilistic instead of deterministic.

The Human Moves Up the Stack to the Verification Boundary

Every time we moved up the abstraction ladder, the human role shifted. We stopped writing the lower-level thing and started governing how it got produced. This time, the shift has a specific shape.

I don’t need to read every line of generated CRUD anymore. What I need to do is govern the boundary between stochastic generation and deterministic system surfaces. I need to make sure that nothing AI produces probabilistically hardens into load-bearing system behavior without verification.

That governance takes a specific form. The constraint-first loop:

  1. Define the contract. Specify inputs, outputs, invariants, and boundaries before any code is generated.
  2. Define the tests. Write verification criteria that encode what correct behavior looks like.
  3. Generate. Let AI implement against the contract and tests.
  4. Evaluate. Run the tests. Check the output against the contract.
  5. Reject or accept. If the output violates the contract, reject it. Do not patch stochastic output manually.
  6. Refine. Tighten the contract or the tests based on what failed.
  7. Loop. Repeat until the output passes verification.
PassFailDefine ContractDefine TestsGenerate with AIEvaluate OutputAcceptRejectRefine Constraints

This loop isn’t just a workflow preference. It’s the verification layer that makes AI-assisted development safe. Without it, you’re letting dice rolls become the walls of your building.

The human moves from “writer” to “architect and governor.” And that’s uncomfortable for people who built their identity around keystrokes.

We Might Need More People, Not Fewer

Here’s the part people don’t expect: we may need more humans in this world, not fewer.

The reasoning is simple. If generation cost drops to near zero, the volume of systems being built explodes. Every new system still needs someone to define its constraints, verify its behavior, govern its boundaries, and decide what it should and shouldn’t do.

Those tasks don’t compress the way implementation does. A single architect can’t govern fifty AI-generated services any more than a single building inspector can sign off on fifty skyscrapers going up simultaneously.

So the roles shift. Fewer people writing boilerplate. More people designing systems, defining evaluation criteria, modeling business intent, and governing safety. The bottleneck won’t be “who can type the fastest.” It’ll be “who can think clearly about systems at the rate those systems are being produced.”

The Abstraction Layer Is Rising. Again.

Software was never about typing. It was about shaping constraints around state.

That truth has been there since Hopper’s team wrote hole positions on paper. It’s been there through every abstraction layer since. The implementation details changed. The nature of the work didn’t.

CRUD isn’t the problem. Cheap CRUD without containment is. We’re about to produce more software in five years than the previous fifty combined. The question isn’t whether we can generate it. The question is whether we can scale constraint discipline as fast as we’re scaling code production.

That’s where AgenticOps begins.

Let’s talk about it.

Previous: [I’ve Never Fully Understood the Systems I Work In. AI Is Making That Worse.]

Next: [What AgenticOps Actually Looks Like]

I’ve Never Fully Understood the Systems I Work In. AI Is Making That Worse.

I don’t know how many systems I’ve worked in without fully understanding how they work.

I’ve debugged production issues in codebases I’d never seen before. I’ve added features to systems that were built years before I showed up. I’ve built systems understanding how they are built but not why they are being built.

No documentation. No architecture diagrams. No one left on the team who could explain why that weird abstraction exists or what constraints shaped the original design. No context on the intent behind the technical debt I was inheriting. No explanation for why the system was overly complex.

I have built and maintained small systems, massive systems at scale, and in between. I rarely, if ever, had a complete understanding of any of them. I couldn’t hold them in my head. I couldn’t walk through them class by class, function by function, or explain them end to end with any real confidence.

Not because of some personal failing. Because complex systems are not memorizable. They never were.

AI Expands the Surface Area

AI can produce thousands of lines of code in a day. It can scaffold entire services, generate integrations, write tests, refactor modules. The surface area of what “exists” in a codebase is exploding. I’m going to know even less than I did before about what’s in these code repositories.

So the question I keep coming back to isn’t “how do I understand everything?” That was never realistic for me. The question is, how do I operate safely and effectively in systems I don’t fully understand, especially when AI is multiplying how much code exists?

Code is accumulating faster than any human can read it, and the abstraction layer I operate in is rising with it. The problem isn’t just bigger. It’s structurally different.

Total Understanding Was Always a Myth

I could never understand a complex system end to end. Not really. Especially once it crosses a certain threshold of complexity. What I understand are abstractions, the models, flows, boundaries, invariants.

Mechanical familiarity, reading and understanding every line, is not the same as structural comprehension. As a C# programmer you don’t read the IL the compiler emits. Not because the IL doesn’t matter, but because the compiler operates within constraints that make line-by-line review redundant. The language specification is the review.

Do we need to do a mechanical review of code generated by an agent?

A fair objection is a compiler is deterministic. Same input, same output, every time. An agent is stochastic. Same constraints can produce structurally different code on each run. But that variance isn’t new. Put three developers in separate rooms with the same requirements and you’ll get three different implementations. Different variable names, different control flow, different abstractions. The output was never deterministic. We dealt with that variance long before AI through code review, architecture, contracts, and automated checks and tests.

The best reviewers never reviewed code by expecting a specific implementation. They reviewed structurally: does it satisfy the contract, pass the tests, respect the boundaries? But plenty of code review was mechanical. Line by line, checking syntax, naming, style, catching things a linter should catch. That worked when the volume of code roughly matched the capacity to read it.

Agents break that balance. They produce more code than any human can efficiently read line by line. Mechanical code review doesn’t scale to agent-speed output. What replaces it isn’t less review. It’s a different kind of review. Instead of code review maybe we call it peer review or agent-assisted review with a focus on constraints, invariants, contracts, and structural correctness. The discipline that the best reviewers always practiced becomes the only viable approach.

What actually matters is: what value does a system produce? Who consumes it? What are the critical flows? Where are the boundaries? What must never break? How to increase maintainability, quality, security… value?

If I can explain how value moves through a system, I’m in control of how I move and operate in that system. If I can’t, I’m guessing. And I’ve done enough guessing with enough experience to make intuition look intentional. I hallucinated long before AI.

I Had to Stop Thinking Bottom-Up

For a long time my instinct when entering a new codebase was to start reading code. File by file, class by class. It felt productive. It wasn’t. I ended up with a pile of implementation details and no mental model to hang them on. I understood How without the Why.

The Why, the business purpose a system exists to serve, is the reason I was hired. The systems I worked in existed for a reason. Understanding the services that serve the users of the system maps to that purpose. Designing, building, and maintaining services is what I do, and the Why is the reason I do it.

Understanding moved top-down. Define the Northstar of the system and the purpose of its services. Map the user problem to the user experience through service interfaces and the contract for inputs and outputs.

Identify state transitions and data flows. Understand the dependencies. Clarify the invariants, the things that must always be true for the system and services to function.

Only then do I care about how specific classes or functions are implemented.

If I can’t sketch a system or service on a whiteboard in five minutes, I don’t understand it yet. Doesn’t matter how many files I’ve written or read. I am hired to support the Why above the code.

With AI Agents, My Role Changes

Today, I’m not the typist, the writer of code, that focuses on the How. I’m the operator of AI agents. I deliver the Why by designing and evaluating the How driven by agents.

Uncle Bob Martin made a sharp observation in his book We, Programmers. He traces how the word “code” comes from Grace Hopper’s team programming the Harvard Mark I. “Code” referred to the numbers they wrote on paper representing hole positions on 24-bit paper tape.

Hopper spent the rest of her career trying to get away from code, trying to move toward more natural languages. Eighty years later, we still call our programs “code.” Uncle Bob calls that a reflection of our failure to meet her goal.

He frames the AI question as a binary. Is AI just the next compiler, translating higher-level code to lower-level code? Or is it what Hopper envisioned, something where prompts aren’t code at all, but natural language negotiations and the realization of Hopper’s goal?

It’s a good question. But I wonder if compiler-vs-negotiation is the real axis.

I view it more as deterministic vs stochastic.

When AI scaffolds a CRUD service from a schema, the output is predictable and verifiable. The task has a narrow solution space with deterministic input, clear schema, clear constraints, predictable output. You can inspect it and trust it roughly the way you’d trust a compiler.

When AI reasons through edge cases, infers business intent, or makes architectural judgment calls, that’s probabilistic. Ambiguous intent, competing constraints, tradeoffs that require iterative refinement. The output isn’t necessarily wrong, but it’s sampled a next token guess. Run it again and you might get a different answer.

And the confidence surface is invisible. There’s no compiler warning when AI makes a plausible-but-wrong architectural choice. Hallucinations don’t have an error code.

The mistake is using stochastic reasoning to produce deterministic system surfaces without verification.

A contract. An interface. A migration. A security boundary. These become load-bearing the moment they exist. The system doesn’t know or care that the thing defining its behavior was probabilistically generated. It executes it as truth.

This is the gap. AI can generate implementation, the How. What it can’t generate are constraints, architectural boundaries, risk surfaces, and operational discipline. Yes it can write words that appear to be constraints, but I can’t delegate my responsibility for them, even if I let AI do most of the writing.

What I allow to go into production is on me, no one else. It’s on me to make sure that anything AI generates probabilistically gets verified before it hardens into something a production system treats as fact.

If AI writes 10,000 lines of code and I haven’t defined the contracts, the interfaces, the performance expectations, the security constraints, the observability requirements, and the test surfaces, then I’ve let dice rolls become load-bearing walls in the system.

AI doesn’t remove architectural responsibility. It amplifies it.

I Don’t Scale Understanding. I Scale Containment.

I’m not trying to know everything. I gave up on that a long time ago. What I’m trying to do is design systems where I don’t have to know everything.

That means clear interfaces. Explicit schemas. Strict typing. Unit tests, contract tests, integration tests, security and performance tests. Tracing and metrics. Logs that actually tell me something useful when things go sideways.

If something breaks, I don’t rely on memory. I rely on instrumentation. Understanding becomes observational, not memorized.

I can’t hold 200,000 lines of code in my head. But I’ll hold onto a one-page system summary, a lifecycle map, a state machine diagram, a list of invariants, a list of “what must never happen,” and a dependency diagram.

Those are the compression artifacts I’ll actually carry around. Not the implementation. The constraints that govern it.

And with AI generating code faster than I can read it, constraint-first is the only sane approach. Define the contract. Define the tests. Define the boundaries. Then let AI implement. Evaluate the result. Accept, reject, or refine. Loop until convergence.

That loop is the verification layer. It catches stochastic output before it becomes deterministic system behavior. Without it, AI-generated systems turn into unbounded complexity farms.

When I design and build, I optimize for maintainability and low mean time to value. Containment is how I get there.

The Real Shift

I believe the industry is moving from “I understand every line of code” to “I understand the boundaries, constraints, and risk surfaces.” Some might call that a loss of craftsmanship. I think it’s evolution.

The skill isn’t omniscience. It’s navigational confidence. Can I enter a foreign system, form a hypothesis, test it safely, reduce the blast radius, and improve it incrementally? If yes, I’m fine. AI doesn’t change that. It just increases the speed at which I can do it.

I don’t think software development has ever been about typing in a coding language. To me it’s about shaping constraints around state.

The job is transcending up a layer of abstraction, operating teams of AI agents and governing the boundary between what AI generates probabilistically and what systems execute deterministically. The code, whether I wrote it or AI did, is an artifact of my decisions. And my decisions are what matter.

That’s the shift I’m building around. And I call it AgenticOps.

Let’s talk about it.

Next: [Most Software Is Just CRUD. That’s Not the Problem.]

Codify How You Work

You don’t build an agent by thinking about agents. You build an agent by thinking about how you do work.

Your ability to multiply your output begins with a simple discipline: take the skills locked in your head and turn them into structured, repeatable workflows. This is the starting point for all operational leverage. This is the kernel the entire system will be improved on.

The Way

When you codify how you work, you give yourself a system that can multiply your output and scale across projects, teams, and tools. You create clarity about how decisions get made, how work begins, how it moves, and how it completes. This is the foundation for any AgenticOps system you may build later.

But at this stage, the focus is only on you and the real place you get work done.

Establishing structure creates surface area for improvement. Improvement reduces waste. Waste reduction compounds over time.

This is the quiet logic behind AgenticOps: you externalize your way of working, let the system run your way as is. Then observe the friction and reduce waste where it naturally accumulates. You are not inventing efficiency. You are uncovering it and optimizing it away.

The Problem

Most people never write down how they work. They assume it is too complex, too obvious, or too personal to articulate. Some worry that codification leads to replacement. Some worry that it is tedious or unnecessary.

These fears result in the same outcome. The process remains invisible, so it cannot be measured, analyzed, improved, shared, or extended. Its hard to multiply if you can’t see what to multiply.

Then there are people that write down how they want to work instead of how they work today. Premature optimization is a trap. Clarity first. Compression later.

Solution Overview

The DecoupledLogic way is to treat your workflow and the workflow data as the most valuable operational asset you have. Before any optimization or automation is possible, we capture the real way you move through work. Not the theoretical model. Not the cleaned-up version. Not the one you wish you followed. The one you actually practice when no one is watching.

How you orient yourself. How you define what matters. How you locate the boundaries. How you identify the first irreversible decision. How you choose what not to do. How you set priority and direction before you set pace.

This is the material the system will learn from. This is the kernel it will grow from.

How It Works

All work follows a simple cycle: Start > Work > End. Input > Process > Output. This canonical sequence never changes. It is one of the few timeless rules in operational thinking.

Within the loop there are deeper patterns that matter.

Getting ready

This is how you select the next thing to work on. How you prioritize. How you set a target or goal. How you define the expected outcome and your stopping point. How you establish your north star. This is also where you gather context, align resources, and prepare your operational environment. Getting ready is not passive. It is an active decision about where your attention is going and why. This could be a simple 10 second though, don’t make it overly deep.

Starting the work

This is how you signal the start of the task. How you initiate the first meaningful action. How you reduce uncertainty enough to move forward. How you commit to the direction you set in the previous step. Starting is not the same as preparing. It is the moment you choose momentum over deliberation and take the first step out of the starting blocks.

This is the first move. And that first move is the kernel the entire system will be built on.

Working in flow

This is how you break down the problem. How you evaluate options and make decisions. How you measure progress while you are inside the work. How you prevent stalls and maintain forward motion. This is where your thinking style creates the most value and where codification has the greatest impact.

Ending the work

This is how you decide something is complete. How you package, deliver, publish, or hand off. How you create closure and free cognitive space for the next cycle. Ending well is as important as starting well because it defines what counts as done.

Reviewing the work

This is how you assess quality. How you reflect on what happened. How you identify improvement targets. How you reset for the next iteration of a cycle.

Cross-cutting functions

These patterns show up at every stage. How you communicate the work. How you measure the work. How you improve the work. These are not separate steps. They shape the entire cycle from beginning to end.

You already do all of this consciously or subconsciously. Codification is simply making it visible.

Impact

Once your workflow is explicit:

  • You gain clarity about your own method
  • You reduce waste because you can see where energy leaks
  • You create a pattern others can follow without confusion
  • You unlock automation and agents that actually reflect how you work
  • You build a system that can multiply your output and evolve with you, not around you

A system is only as strong as its kernel. An agent is only as good as the pattern it learns from. And a workflow can only be optimized once the work itself has been made visible.

Start

Do not start building agents by thinking about agents. Start by thinking about how the work is done.

Write down your beginning, your flow, your completion, and your review. Capture your real process in the real place that work gets done. Let the existing system reflect back how it works. Then reduce the waste you can now see in the reflection.

Once the process exists outside the head of the people doing the work, the path to optimization becomes straightforward. Start with how you work.

If you want, we can codify your workflow together and create your first operational blueprint to begin improving how you work in the agentic age.

Let’s talk about it.

Background Agents in Cursor: Cloud-Powered Coding at Scale

Build Faster with Cloud-First Automation

Imagine coding without ever leaving your IDE, delegating repetitive tasks to AI agents running silently in the background. That’s the vision behind Cursor’s new Background Agents, a new feature that brings scalable, cloud-native AI automation directly into your development workflow.

From Local Prompts to Parallel Cloud Execution

In traditional AI pair-programming tools, you’re limited to one interaction at a time. Cursor’s Background Agents break this mold by enabling multiple agents to run concurrently in the cloud, each working on isolated tasks while you stay focused on your core logic.

Whether it’s UI bug fixes, content updates, or inserting reusable components, you can queue tasks, track status, and review results, all from inside Cursor.

Why This Matters

Problem: Manual Context Switching Slows Us Down

Every time we need to fix layout issues, update ads, or create pull requests, we context-switch between the browser, editor, GitHub, and back.

Solution: One-Click Cloud Agents

With Background Agents, we:

  • Offload UI tweaks or content changes in seconds
  • Automatically create and switch to feature branches
  • Review and merge pull requests without leaving the IDE

It’s GitHub Copilot meets DevOps, fully integrated.

How It Works

  1. Enable Background Agents under Settings → Beta in Cursor.
  2. Authenticate GitHub for seamless PR handling.
  3. Snapshot your environment, so the agent can mirror it in the cloud.
  4. Assign tasks visually using screenshots and plain language prompts.
  5. Review results in the control panel with direct PR links.

Each agent operates independently, meaning you can:

  • Fix mobile UI bugs in parallel with adding a new ad card.
  • Update dummy content while another agent links it to a live repo.

Keep tabs on multiple tasks without blocking your main flow.

Note: This is expensive at the moment because it will use the Max Mode.

The Impact: Focus Where It Matters

  • 🚀 Speed: Complete multi-step changes in minutes.
  • 🧠 Context: Stay immersed in Cursor with no GitHub tab juggling.
  • 🤝 Collaboration: Review, update, and deploy changes faster as a team.

What’s Next?

The Cursor team is working on:

  • Auto-merging from Cursor (no GitHub hop)
  • Smarter task context awareness
  • Conflict resolution across overlapping branches

Is this is the future of development workflows, agent-powered, cloud-native, and editor-first?

Try It Out

Enable Background Agents in Cursor and assign your first task. Start with a UI fix or content block update and see how you like it. Just remember that this service uses Max Mode and is expensive so be careful.

If you are looking to improve your development workflow with AI, let’s talk about it.

enterprise_software_is_broken

Enterprise SaaS is Broken. AI Agents Can Fix It.

Let’s talk about enterprise software.

Everyone knows the dirty secret: it’s complex, bloated, slow to change, and ridiculously expensive to customize. It’s a million dollar commitment for a five-year implementation plan that still leaves users with clunky UIs, missing features, and endless integration headaches.

And yet, companies line up for enterprise software as a service (SaaS) products. Why? Because the alternative, building custom systems from scratch, can be even worse.

But what if there was a third way?

I believe there is. And I believe AgenticOps and AI agents are the key to unlocking it.

The Current Limitation: AI Agents Can’t Build Enterprise Systems (Yet)

There’s a widely held belief that AI agents aren’t capable of building and maintaining enterprise software. And let’s be clear: today, that’s mostly true.

Enterprise software isn’t just code. It’s architecture, security, compliance, SLAs, user permissions, complex business rules, and messy integrations. It’s decades of decisions and interdependencies. It requires long-range memory, system-wide awareness, judgment, and stakeholder alignment.

AI agents today can generate CRUD services and unit tests. They can refactor a function or scaffold an API. But they can’t steward a system over time, not without help.

The Disruptive Model: Enterprise System with a Core + Customizable Modules

If I were to build a new enterprise system today, I wouldn’t sell build a monoliths or one-off custom builds.

I’d build a base platform, a composable, API-driven foundation of core services like auth, eventing, rules, workflows, and domain modules (like claims, rating engines, billing, etc. for insurance).

Then, I’d enable intelligent customization through AI agents.

For example, a customer could start with a standard rating engine, then they could ask the system for customizations:

> “Can you add a modifier based on the customer’s loyalty history?”

An agent would take the customization request:

  • Fork the base module.
  • Inject the logic.
  • Update validation rules and documentation.
  • Write test coverage.
  • Submit a merge request into a sandbox or preview environment.

This isn’t theoretical. This is doable today with the right architecture, agent orchestration, and human-in-the-loop oversight.

The Role of AI Agents in This Model

AI agents aren’t building without engineers. They’re replacing repetition. They’re doing the boilerplate, the templating, the tedious tasks that slow innovation to a crawl.

In this AgenticOps model, AI agents act as:

  • Spec interpreters (reading a change request and converting it into code)
  • Module customizers (modifying logic inside a safe boundary)
  • Test authors and validators
  • Deployment orchestrators

Meanwhile, human developers become:

  • Architects of the core platform
  • Stewards of system integrity
  • Reviewers and domain modelers
  • Trainers of the agent workforce

The AI agent doesn’t own the system. But it extends it rapidly, safely, and repeatedly.

This Isn’t Just Faster. It’s a Better Business Model.

What we’re describing is enterprise software as a service as a living organism, not a static product. It adapts, evolves, and molds to each client’s needs without breaking the core.

It means:

  • Shorter sales cycles (“Here’s the base. Let’s customize.”)
  • Lower delivery cost (AI handles the repetitive implementation work)
  • Faster time to value (custom features in days, not quarters)
  • Higher satisfaction (because the system actually does what clients need)
  • Recurring revenue from modules and updates

What It Takes to Pull This Off

To make this AgenticOps model work, we need:

  • A composable platform architecture with contracts at every boundary (OpenAPI, MCP, etc.)
  • Agents trained on domain-specific architecture patterns and rules
  • A human-in-the-loop review system with automated guardrails
  • A way to deploy, test, and validate changes per client
  • Observability, governance, and audit logs for every action an agent takes

Core build with self serve client customizations.

AI Agents Won’t Build Enterprise Software Alone. But They’ll Change the Game for Those Who Do.

In this vision, AI Agents aren’t here to replace engineers. In reality, they may very well replace some engineers, but they could also increase the need for more engineers to manage this agent workforce. Today, AI Agents can equip engineers and make them faster, freer, and more focused on the work that actually moves the needle.

This is the future: enterprise SaaS that starts composable, stays governable, and evolves continuously to meet client needs with AI-augmented teams.

If you’re building this kind of Agentic system, or want to, let’s talk about it.

AI Agents Can Write Code, Here’s How We Win as Developers

This is a thought exercise and a biased prediction. I have no real facts except what I see happening in the news and observed through my experiences. I don’t have any proof to back up some of my predictions about the future. So, feel free to disagree. Challenge my position, especially when I try to blow up the rockets in the end.

The Game has Changed

We don’t need to write C#, Python, or Java to build software anymore. Just like we no longer need to code in assembly or binary, today’s high-level languages are now being pushed down a level. We can code by talking to an AI agent in plain English. This isn’t science fiction. AI agents are here, and they’re disrupting traditional software development. The value isn’t in writing code, it’s in delivering value and desired business outcomes.

Soon, every app can be basically copied by an agent. Features don’t matter, value does. This means the future isn’t about who can write the best code or build the best feature set. Any product developer with an agent worth its silicon will be able to write an app. For product developers, it will soon be about who can use AI agents in a way that actually delivers business value. Those developers that have Agentic: Ops, Design, Development, Infrastructure, and Marketing will beat those without. Those with agents and experienced agent operators that deliver value rapidly will beat the developers that still take 3 months to deliver an MVP.

AI is No Longer Just a Tool, It’s the New Coder

AI assistants won’t just be assisting developers, as I once thought, they will become the developers, the designers, the marketers, the project managers. The shift isn’t about writing code faster. It’s about not writing code at all and letting AI generate, deploy, and optimize entire systems. How do we manage AI agent employees? An AI HR agent? The implications are far wider than just the replacement of humans in developer roles. Markets are going to shift, industries will be disrupted regularly, the world is going to enter a new age faster than any other shift in civilization that we’ve had in the past. I may be wrong, but it looks clear to me.

What does that mean for us?

  • The focus moves from software development to AI agent development and integration.
  • Companies that figure out how to deliver value with agents effectively will dominate product development.
  • The winners will have an early advantage building a proven system, with tested agents, and experienced agent operators that customers will trust to continuously deliver desired value.

Product Features are Dead, Value Delivery is Everything

If an AI agent can copy any feature, what really matters in product development? Value delivery, that’s what. Value has always been king and queen in product development. I believe it’s more important now than ever. AI-native product developers will outperform traditional ones not only because they don’t waste time or money on manually coding features, but they focus on outcomes and delivering value that deliver those outcomes.

Hell, I’m seeing people that can’t code build apps that use to take weeks to build. They can build an app in 30 minutes, and we are still on v1 baby agents. What happens when the agents grow up in a couple years. In the future time won’t matter because we can deliver apps and features in day. Costs become less of a concern because agents cost less than hiring new employees. Understanding and delivering value will be the great divider between product development teams. Those who can wield agents to understand and deliver value will do better in the market.

China and the team that built DeepSeek proved that they can beat the likes of multi-billion dollar US aligned companies with less than $10 million to train a frontier model. What will someone with a team of agents delivering value in days do against an old school team of human developers delivering the same value in months.

Think about it 🤔

Businesses don’t care if the back end is in Python or Rust. They care if revenue goes up and costs go down.

Customers don’t care if their data is in PostgreSQL or SQL Server. They care if their system is performant and costs are feasible.

Users don’t care if the UI is React or Blazor. They care if the experience is seamless and solves their problems.

No one asks whether an AI agent or human wrote the code, they just want a solution that fills their needs.

A product development team’s value is not in their technology choices but in the value they can deliver and maintain.

The AI-Native Product Development Playbook

If AI replaces traditional software product development, how do we compete? We learn to not focus on coding features, we build AI-driven systems that can deliver value.

Here’s A Playbook 🚀

1. Find the pain points where AI delivers real value. Optimize workflows, automate decisions, eliminate inefficiencies, increase customer attraction, acquisition, engagement, retention, and satisfaction.
2. Use rapid prototyping to test and iterate at breakneck speed. Don’t waste weeks and months building, when we can ship, test, and refine in days.
3. Orchestrate AI agents. Until AI surpasses AGI (artificial general intelligence) and reaches super intelligence, initial success won’t come from using a single agent. It will come from coordinating multiple agents to work together efficiently.
4. Measure and optimize continuously. The job isn’t done when a system is deployed. AI needs constant tuning, monitoring, and retraining.

People Still Want Human Connection

There’s one thing AI agents can’t replace, human relationships. People will always crave trust, emotional intelligence, and real connection with other humans. Businesses that blend AI automation with authentic human experiences will win.

The Future of Software Product Development is AI-First, Human-Led

This isn’t about whether AI will replace traditional software product development or developers. That ship is sailing as we speak, it is underway. The real question is who will successfully integrate and optimize AI in businesses? Who can help build AI-native businesses that out compete their competitors? I hope the answer is you. The future is AI-first. Those who embrace it will lead. Those who resist will be left behind because we are the Borg, resistance is futile.

Now, my last question is, are you ready? Do you know how to transform now? Evolution is too slow. You must blow up some rockets to rapidly figure out what works and doesn’t work. But doing so is easier said than done when jobs and investments are on the line. For now, we may be OK staying stuck in our ways and relying on old thought processes. I’d say we have 5-10 years (into my retirement years) to enjoy the status quo. However, that time horizon seems to shrink every month and every day not focused on transformation is a day lost to the competition.

Need help in your transformation? Let’s talk about the rockets you want to blow up.

The Copilots Are Coming

This is an unpublished throwback from 2023. Obviously, the Copilots are here and its much scarier than I thought.

In “The age of copilots” Satya Nadella, the CEO of Microsoft, outlines the company’s vision for Microsoft Copilot, positioning it as an integral tool across all user interfaces.

Microsoft Copilot
Meet your everyday AI companion for work and life.

https://www.microsoft.com/en-us/copilot

Copilot incorporates search functionality, harnessing the context of the web. This was a genius pivot of Bing Chat into a multi-platform service. They even have an enterprise version with added data protection (they are listening to the streets). And they are giving power to the people, Microsoft 365 now features Copilot, which operates across various applications. As a developer, my Semantic Kernel plugins can be easily integrated, my OpenAI GPTs and Assistants can be integrated. I can build some things, my team can build more things and considering the world currently needs so many Copilot things, I’m so excited. So many tasks to optimize, so many roles to bring efficiency to, so many jobs-to-be-done to be supported by automation and AI.

We believe in a future where they will be a copilot for everyone and everything you do. 

Satya Nadella, CEO of Microsoft

Nadella emphasizes the customizability of Copilot for individual business needs, highlighting its application in different roles. GitHub Copilot aids developers in coding more efficiently, while SecOps teams leverage it for rapid threat response. For sales and customer service, Copilot integrates with CRM systems and agent desktops to enhance performance.

Furthermore, Nadella speaks about the extension of Copilot through the creation of a Copilot Studio, which allows for further role-specific adaptations. He notes the emerging ecosystem around Copilot, with various independent software vendors and customers developing plugins to foster productivity and insights. I hope this means there is a Copilot Store coming with some revenue share with independent software vendors like the me and the company I work for.

You will, of course, need to tailor your Copilot for your very specific needs, your data, your workflows, as well as your security requirements. No two business processes, no two companies are going to be the same. 

Satya Nadella, CEO of Microsoft

Lastly, Nadella touches on future innovations in AI with mixed reality, where user interactions extend beyond language to gestures and gazes, and in AI with quantum computing, where simulations of natural phenomena can be emulated and quantum advancements can accelerate these processes. He envisions a future where such technology empowers every individual globally (actually Nadella expressed more on Microsoft’s vision of caring for the world and I appreciated it), offering personalized assistance in various aspects of life.

Nadella did a good job of expressing Microsoft’s vision on caring for our world. Microsoft will be “generating 100 percent of the energy they use in their datacenters, from zero-carbon sources by 2025.” He said that and next year is 2024. I hope they stay on track towards this goal.

Charles L. Bryant, Citizen of the World

The message concludes with a reference to a video featuring a Ukrainian developer’s experience with Copilot. This is also a lesson in the power of expressing the value of a product with story and emotion. Storyboard Copilot is coming too.

Why We Need to Bet on Agents Now

Let’s cut through the noise. Agents, these AI-driven digital workers, aren’t some sci-fi fantasy. They’re here, and they’re about to fundamentally change how you go about your day and how your business operates. Whether you’re building products, running marketing campaigns, or supporting operations or clients, understanding agents is no longer optional. It’s the key to getting and staying ahead.

Agents Are No Longer Theoretical

My prediction is that in the near future, agents will be indispensable. People won’t monitor their email. They won’t browse social media or use apps and websites as they do today. Their agents will do these tasks for them. These AI-driven workers will curate and deliver exactly what users need, without requiring them to use third-party user interfaces. We won’t have to log into Instagram or email. Our agent can stream email and content from other services through a single interface.

This will change marketing because marketers will have to learn how to attract agents to reach their human operators. Online stores will have to learn how to sell to agents. Agents make purchases on behalf of their human operators. Websites and apps won’t target humans but agents. If it can be done on a computer, agents will be able to do it. This includes phones. We need to rethink target users across our products. Our world will go through an epic paradigm shift.

Agents are still an emerging concept, and nothing is real or set in stone yet. However, early movers are already deploying agents. They use them to automate tasks, generate content, write code, and optimize decision-making. But here’s the kicker, most businesses don’t yet have agents tailored to their unique needs. This presents a massive opportunity. The potential applications are vast, and the market is wide open. If you get started today, we’re not just building agents; we’re writing the best practices for this transformation. By focusing on how to attract and build agents now, we’re positioning ourselves to thrive as the agent ecosystem grows.

This is our chance to step up as experts. Yes, we’re in uncharted territory, but that’s a good thing. I have made predictions here. However, no one really knows what’s coming. No one knows what to do to apply agents in industries. We’re not just building agents; we’re shaping the best practices that will define agents in our respective industries.

Why Early Adoption Matters

Being early comes with risks, but the opportunities and reward far outweigh them. By diving in now, we can shape the future of how agents are built, delivered, and operated. Early adoption means gaining:

  • Experience: Each agent we develop is a chance to learn from both success and failure. What works, what doesn’t, and how to pivot.
  • Credibility: As agents become mainstream, businesses will seek pioneers, those who’ve already proven their expertise and early results.
  • Market Advantage: Agents are self-improving. If we start soon, we will develop smarter and more capable agents sooner. Our systems will perform better compared to late adopters. Compounded learning will separate leaders from laggards. By diving in now, we gain an early entrance advantage in terms of experience and credibility. We also gain a head start in acquiring the precious data we need. This data is essential to feed our agents and improve their performance.

The Work Ahead

We must learn to build agents. We must also understand how to deliver and operate them as the best solution for specific use cases.

Delivering Agents

  • Planning: Understand the jobs to be done. Identify use cases, workflows, and challenges where agents can provide meaningful value.
  • Designing: Define clear objectives, user interactions, and system integration and interfaces for the agent.
  • Building: Train agents on the right data, using AI frameworks that allow flexibility and growth.
  • Testing and Iterating: Rigorously evaluate agent performance and refine based on real-world feedback.
  • Deploying: Introduce agents thoughtfully, ensuring seamless onboarding and integration with existing tools and workflows.
  • Releasing: Equip users with proper training and documentation to ensure successful adoption.

Operating Agents

  • Managing: Overseeing the agent’s functionality, ensuring it runs as expected, and addressing any operational issues.
  • Monitoring: Tracking real-time performance metrics, such as speed, accuracy, and user feedback, to ensure consistent quality.
  • Evaluating: Regularly reviewing the agent’s outcomes against its goals, identifying areas for improvement or additional training.
  • Improving: Iterating on the agent. This involves refining its prompts, templates, tools, and algorithms. We can update its RAG with new data. We can fine-tune it or retrain it with new data. We can also enhance its features to adapt to evolving needs.

Roadmap

Our roadmap to be successful with agents as a product focuses on both strategic insights and actionable steps.

  1. Understand the Jobs to Be Done: Not every task needs an agent, and replacing traditional digital solutions (e.g., websites or apps) requires clear benefits.
  2. Iterate Relentlessly: The first version of any agent won’t be perfect. It may often hallucinate and get things wrong. That’s fine. What matters is how quickly we learn and adapt.
  3. Collaborate Across Teams: Product, marketing, and support teams must all contribute. Everyone’s input is critical. The more perspectives we have, the better equipped we are to design and refine agents that excel.
  4. Measure and Optimize: Agents need monitoring and fine-tuning. Metrics like accuracy, speed, and user satisfaction will guide us.

Agents Improve Over Time

Let’s tackle a key truth, the first iteration of any agent will rarely deliver perfect results. Early versions might be clunky, prone to hallucinations, errors, or lacking the nuanced judgment needed for complex tasks. But that’s not a failure. It marks the beginning of an iterative process. This process allows agents to learn, adapt, and improve through data and feedback.

Unlike traditional solutions, which typically rely on fixed algorithms and human-driven updates, agents can operate dynamically. They evolve in real-time as they encounter new data and scenarios. This ability to self-optimize positions agents as uniquely suited for complex and evolving challenges where traditional solutions fall short.

  • Initial Challenges: In their infancy, agents might struggle with insufficient data, unclear objectives, or unexpected scenarios. These early hiccups can result in inconsistent performance or even outright errors.
  • Continuous Learning: With every iteration, agents refine their capabilities. New data helps them understand patterns better, adapt to edge cases, and make more accurate decisions. The more they’re used, the smarter they get.
  • Operator Involvement: Effective improvement requires skilled operators. We monitor agent performance. We analyze results and provide feedback and data. In doing so, we ensure agents evolve in ways that align with business goals.
  • Replacing Traditional Solutions: Over time, agents become faster. They become more accurate and better tuned to tasks. Eventually, they will outperform traditional solutions and humans. This transformation won’t happen overnight, but the incremental improvements lead to exponential results. Starting early helps us get through this journey faster than late adopters.

The goal isn’t perfection from day one. It’s about building a foundation that grows stronger and more capable with time.

A Vision for What’s Next

Agents will handle the tedious, time-consuming stuff, freeing us to focus on strategy, creativity, and big-picture thinking. Our clients see the results. Our stakeholders see the value. We get to lead the charge in one of the most exciting shifts in tech.

But this won’t happen by accident. It’s going to take the courage to move now with bold ideas and hard work. Its going to take a willingness to fail fast and learn faster. Let’s embrace the challenge and make it happen.

Let’s get to work! Do you want to talk about how to start or improve your agentic ops journey, I’m here.