Tagged: business

From North Star to Unit Economics

This is a new series that focuses on my thoughts and systems around value stream management. I call it ValueOps, a system I have been building long before AgenticOps.

AI has given me a gift by allowing me to expand my exploration of topics I have been exploring for years. Some may call this post AI slop because I allow AI to cook, but I am most certainly still the chef. The recipes are mine, the ingredients (my thoughts) are mine, and all words pass my taste and quality tests.

I hope you and your agent gets something out of this and it helps you to continuously improve the systems you care about.

Bon Appetite


I have always liked North Star Metrics. They force a team to stop measuring everything and decide what one thing actually points toward value.

That is useful. It is also not enough.

I have seen teams rally around a metric that felt right, moved up, and still did not explain whether the business was getting healthier. The chart improved. The economics did not. Or worse, nobody could tell if the economics improved because the North Star lived in one system and the money lived somewhere else.

So the problem is not North Star thinking. The problem is where it usually stops.

A North Star gives direction. Unit economics gives economic truth. The gap between them is the operational event that can be measured, followed, converted, costed, retained, and tested.

That event is what I call a Value Fact.

What the North Star Got Right

Amplitude describes the North Star Metric as the metric that best captures the value customers get from a product. That is the right instinct. The metric should not be vanity activity. It should be close to the thing the customer values.

Reforge pushes the idea further by breaking North Star Metrics into unit of value, quality, and frequency. That matters because not every action counts. A signup is not the same as an activated user. A visit is not the same as a reviewer. A trial is not the same as a trial with real usage.

That is the part I want to keep.

A good North Star is not revenue. Revenue is late. It tells you what already happened. A good North Star is upstream from revenue, but not so far upstream that it becomes noise.

It is the best observable proxy for value creation before the financial result fully arrives.

Airbnb’s common example is nights booked. That works because a night booked is not just activity. It means the marketplace connected demand and supply. A guest found a place to stay. A host received a booking. The business created the conditions for revenue.

That is the shape.

Where the North Star Falls Short

Here is where I think the usual North Star conversation gets weak.

It helps the team align, but it does not always make the metric accountable.

You can pick a North Star, put it on a dashboard, and still not know what one more unit is worth. You may not know what it cost to create. You may not know whether it converts. You may not know whether the customers it produces stay long enough to justify the investment.

So the team gets direction, but the operator still lacks economics.

Now, that might sound unfair. A North Star is not supposed to be a full financial model. Fair. I agree.

But if the metric is going to guide product strategy, resource allocation, experiments, and operating decisions, then at some point it has to cross the bridge into economics.

Otherwise the organization is optimizing a belief.

Maybe it is a good belief. Maybe not.

The Bridge Is the Value Fact

ValueOps starts with a narrower claim.

Do not start with all the metrics. Do not start with revenue. Do not start with a dashboard.

Start with the countable operational fact that represents unrealized value.

For a SaaS company, that might be an activated trial workspace. For a marketplace, it might be a completed match. For a services firm, it might be a completed job that creates a follow-on opportunity. For a support product, it might be a real ticket resolved by the agent.

The name matters because it separates two ideas that usually blur together.

North Star is the strategic role.

Value Fact is the measurement object.

The North Star tells the team, this is the event we believe points toward value. The Value Fact model says, prove it. Count it. Cohort it. Attach dimensions. Track conversion. Measure the value. Measure the cost. Then see whether the belief survives contact with the business.

Here is the path.

That is the move from product strategy to unit economics.

Not because the North Star was wrong. Because it was unfinished.

What Has to Be True

For a North Star to become a Value Fact, it has to pass a few tests.

It has to be countable. The event either happened or it did not.

It has to be cohortable. You can follow the facts from a period forward and see what happened to them.

It has to be attributable. You know which channel, segment, workflow, team, region, or operating path produced it.

It has to be convertible. There is a later event where the fact becomes delivered value.

It has to be costable. You can measure what it took to produce and convert the fact.

It has to be economically meaningful. One more fact should imply some future value, even if that value has not arrived yet.

If the metric cannot pass those tests, it may still be useful. But I would be careful calling it the operating center of the business.

The Series

This series builds the measurement system from that point.

Value Impact asks whether the operation is getting better at turning facts into value.

Value Efficiency asks what it costs to produce and convert those facts.

Value Ratio puts value and cost in direct relationship.

Value Payback asks how long it takes to recover the investment.

Value Retention asks whether the value stays long enough to matter.

Value Margin asks how much of the revenue the business actually keeps.

Then the final post reduces the system to seven levers and an engine. Fact Volume, Cost Per Fact, Conversion Rate, Cost of Conversion, Revenue Rate, Retention Rate, Cost to Serve.

That is the control surface.

The math is not the hard part. The hard part is picking the right fact and refusing to let it remain a slogan.

The Claim

ValueOps does not replace North Star thinking.

It makes North Star thinking accountable.

The North Star gives the product direction. The Value Fact gives the measurement system something to operate on. The models turn that fact into economics.

That is the bridge I have been looking for.

From North Star to Unit Economics.

Let’s talk about it.

Every Product Needs a North Star Metric

How to Choose and Measure North Star Metrics

Next: Value Impact

40% Will Be Canceled. Not Because the Models Failed.

Post 3 of the AgenticOps series defined the six layers and four containment rings. This post maps Gartner’s projected cancellation drivers to specific gaps in that model.


The Comfortable Take Is Wrong

The take you keep seeing is that AI projects fail because the models are not ready. They hallucinate. They are unreliable. Wait for better models. Feel me? That is the played take. And it is distracting.

Gartner predicts more than 40% of agentic AI projects will be canceled or scaled back by 2027. The cited reasons are escalating costs, unclear business value, and inadequate risk controls.

None of those are model failures. GPT-5, Claude, Gemini will all be more capable in 2027 than they are today.

Real talk: the bottleneck is governance. Or more precisely, the absence of it.

73% of organizations are deploying AI tools right now. Only 7% govern them in real time.

That is a 66-point gap between deployment velocity and governance maturity. And that gap is exactly where the 40% lives.

80% of organizations report risky agent behaviors in production. 15% of daily work decisions will be made by agentic AI by 2028, up from essentially zero in 2024.

The industry is scaling deployment without scaling containment. Gartner’s 40% cancellation rate is not a prediction about models. It is a prediction about what happens when you run stochastic systems without structural boundaries.

Now let me give you a specific example of what is making this worse.

Of the thousands of companies now marketing “agentic AI” capabilities, roughly 130 are real. The rest are agent washing.

They are rebranding chatbots and workflow automations as agentic systems. Organizations buy those products, deploy something that does not need governance, and fail to build governance infrastructure. Then they deploy something that does need it. And discover they have nothing.


Six Failures That Compound

Accelirate analyzed agentic AI governance failures across enterprise deployments. They identified six structural problems. Every one is specific. Every one maps to a gap in the AgenticOps model.

The first failure is no centralized control plane. Teams deploy agents independently. No single system tracks which agents are running, what tools they can reach, or what decisions they make.

The second failure is late governance introduction. Teams build the agent, prove the demo, get funding, start scaling, then discover they need governance. By that point, retrofitting containment into a running system is harder than canceling the project.

The third failure is missing decision traceability. When something goes wrong, no one can reconstruct why the agent chose what it chose. The decision chain is invisible. Debugging becomes archaeology.

The fourth failure is no policy-as-code enforcement. Governance lives in documents. “Agents should not access production data.” But those policies are not enforced by the runtime. They are suggestions. And suggestions do not constrain systems that scale without warning.

The fifth failure is undefined human-in-the-loop thresholds. Everyone agrees humans should stay in the loop. No one defines when. What confidence score triggers escalation? What cost threshold pauses execution? Without thresholds, “human in the loop” is a policy statement with no implementation.

The sixth failure is poor tool differentiation. Agents get broad access because restricting tools is harder than granting them. The result is write access where there should be read access, credentials that should not be held, network reach that is not needed.

These do not happen independently. They cascade.

diagram

Each gap makes the next one harder to close. By the time an organization reaches the sixth failure, the cost of fixing the architecture exceeds the cost of canceling the project. That is Gartner’s 40%.


The Fix Is a Mapping Problem

I want to keep it real with you. The fix is not “add governance.” That sentence is vague enough to produce nothing.

The fix is mapping each failure to the specific layer or ring that prevents it, then building that layer before you need it.

Governance FailureAgenticOps LayerContainment RingWhat Is Missing
No centralized control planeRuntime Governance (L5)Ring 2: Constrain EnvironmentA single registry for all running agents
Late governance introductionIntent (L1)Ring 1: Constrain InputsGovernance requirements in the design, not the incident retro
Missing decision traceabilityEvaluation (L3)Ring 3: Validate OutputsStructured logs with reasoning traces and state changes
No policy-as-code enforcementAgent Generation (L2)Ring 1: Constrain InputsDeclarative policy files the runtime enforces
Undefined HITL thresholdsPromotion (L4)Ring 4: Gate PromotionNumeric thresholds for confidence, cost, and error rate
Poor tool differentiationAgent Generation (L2)Ring 1: Constrain InputsPer-agent tool allowlists, not shared credentials

No driver is exotic. No driver requires a novel solution.

The structural components already exist in every governed agentic system that has reached production.

Stripe’s Minions architecture has all six solved. Devboxes are the control plane and environment constraint. Blueprints define governance at the intent layer. Every tool invocation is logged. Policy enforcement is structural, not advisory. Retry caps define explicit HITL thresholds. Toolshed provides curated, scoped tool access.

Stripe is not in the 40%. The structural reason is visible in the architecture.

Now look at the gap as a shape.

diagram

Every project in that gap is running agents without the infrastructure to govern them. Some will build the infrastructure before it matters. Most will not.


The Diagnostic

Map your project against these six questions. Where you have gaps, you have cancellation risk.

RequirementQuestionPass Criteria
Centralized control planeCan you list every agent running in your organization right now?Single registry with agent identity, status, tool access, and session history
Early governanceWere governance requirements defined before the first agent was deployed?Containment boundaries in the design document, not the incident retrospective
Decision traceabilityCan you reconstruct why an agent made a specific decision last Tuesday?Structured logs with reasoning traces, tool call sequences, and state transitions
Policy-as-codeAre your agent policies enforced by the runtime or written in a wiki?Declarative policy files that the agent cannot override or modify
HITL thresholdsAt what confidence score does your agent escalate to a human?Numeric thresholds for escalation, pause, and termination, enforced automatically
Tool scopingDoes each agent have access only to the tools required for its task?Per-agent tool allowlists, not shared credentials with broad access

Three or more gaps is a project at structural risk.

Five or more gaps matches the profile of the 40% that Gartner predicts will be canceled.

Six gaps is a demo, not a deployment. And that’s the way it is.


Let’s talk about it.

What AgenticOps Actually Looks Like

Autonomy Without Infrastructure Is Just a Demo

Gartner: More Than 40% of Agentic AI Projects to Be Canceled by 2027 (Gartner Symposium/ITxpo 2025)

Accelirate: Agentic AI Governance Challenges and Solutions (accelirate.com)