I wrote this a few years ago, but I’m going through a similar agile transformation right now. Although, every agile transformation is different, this still makes sense to me although it is just a draft post. I figured I’d just post it because I never search my drafts for nuggets of knowledge :).
If we are going to do Kanban we shouldn’t waste time formally planning sprints. Just like we don’t want to do huge up front specifications because of waste cased by unknowns, we don’t want to spend time planning a sprint because the work being done in the sprint can change anytime the customer wants to reprioritize. We should have a backlog of prioritized features. The backlog is regularly prioritized (daily, weekly…) to keep features available to work. If we want to deliver a specific set of features or features in two weeks, prioritize them and the team will do those features next.
There is a limit on the number of features the team can have in progress (work in progress or WIP). Features are considered WIP until they pass UAT. Production would be a better target, but saying a feature is WIP until production is a little far fetched if you aren’t practicing continuous delivery. So, for our system production is considered passing UAT. When the team is under their WIP limit they are free to pull the next feature from highest priority features in the backlog.
This is going to most likely reduce resource utilization, but will increase throughput and improve quality. Managers may take issue at developers not being used at full capacity, but there is a reason for this madness and hopefully I can explain it.
Having features pulled into the pipeline from a prioritized backlog instead of a planned sprint allows decisions on what features to be worked to be deferred until the last possible moment. This provides more agility in the flow of work in the pipeline and the product owner is able to respond quickly to optimize the product in production. Isn’t agile what we’re going for? Pulling work with WIP limits also gives greater risk management. Since batch sizes are smaller, problems will only affect a limited amount of work in progress and risk can be mitigated as new work is introduced in the pipeline. Focusing on a limited amount of work improves the speed at which work is done. There is no context switching and there is a single focus on moving a limited amount work through the system at one time. This increases the flow of work even though there may be times when a developer is idle. The truth is the system can only flow as fast as its slowest link, the constraint. Having one part of the system run at full capacity and overload the constraint introduces a lot of potential waste in the system.
On my current team, we have constraints that determine how quickly we can turn around a feature. Currently, code review and QA are constraints. QA is the largest constraint that limits faster deployment cycles, but more on that later. If we follow the five basic steps outline in the TOC from the book The Goal, we would:
- Identify the constraint(s) – in this instance it’s code review and manual testing
- Exploit the constraint to maximize productivity –
- Subordinate all other steps or processes to speed up or reduce capacity of the constraint – no new work may enter as WIP until the constraint has WIP available.
- Elevate the constraint – for us we will prioritize work that helps remove to remove these work centers as constraints.
The plan is to have developers do code reviews any time WIP stops the movement of work. Also, developers should create automated tests to help lessen the work that QA has to do. The reason we don’t first focus on optimizing QA processes is because focusing on simply optimizing QA processes would actually increases the capacity for QA without increasing the speed at which we can flow work to production. We don’t want to increase the number of features that QA can handle. We also don’t want to speed up manual testing because it is important to take the proper time in testing. What we want to do is remove manual regression as work for QA to open us up to deliver new features to QA faster and get QA to deliver the feature to production faster. QA can focus on what they do best, test. Not running mundane scripted checks.
Normally, we would have to wait for a manual regression test cycle to occur and couldn’t introduce new work because it would invalidate the regression test. With automation handling +80% of regression QA can move faster, actually test more, and we can not only increase throughput through the entire system, but the overall quality of the product.
Monitoring Delivery Pipeline
We track work through the delivery pipeline as features. A feature in this sense is any change, new function, change existing function, or to fix a defect. Features are requested on features kept in a central database. We monitor the delivery pipeline by measuring:
- Lead Time
- Quantity: Unit of Production
- Production Rate
Inventory (V) is any work that has not been delivered to the customer. This counts all work from the backlog to a release awaiting production deployment. Whenever there is undelivered work that is considered invalid it becomes an Operational Expense. Invalid meaning it won’t be delivered at all or there are issues like defect or doesn’t match spec. Invalid work is wasted effort and in the case of a defect causes expensive un-budgeted rework. In traditional cost accounting inventory is seen as an asset, but in TOC it is a “potential” Operational Expense if it is not eventually delivered to customer so turning inventory as fast as possible without injecting defects is a goal.
Quantity: Unit of Production
Quantity: Unit of Production (Q) is the total number of units of work (feature) that have moved through our delivery pipeline to date. Our unit of production is a feature. When a feature is ready to be deployed to production we can increase Q one unit, but the feature is still considered inventory until it has been delivered to customer. If a customer decides they don’t want the feature or some other reason to stop the deployment of the feature, it is counted as an Operational Expense and Q is reduced one unit.
Lead time (LT) is the time it takes to move a feature, one Q, from submission to the backlog to deployed to a customer in production.
Production rate (PR) is the number of Q delivered during a time period. 3 features per month, 2 features per week…
Optimize Delivery Pipeline for Lead Time
We should strive to optimize the delivery pipeline for lead time instead of production rate. The Theory Of Constraints – Productivity Metrics in Software Development posted on lostechies.com explains this well.
Let’s say our current lead time (LT) is 1 unit (Q) in a week or a production rate (PR) of 4 Q per month. If we optimize LT to 1 Q in 3 days, we will see a jump in PR to 6.67 Q per month or a 59% increase.
If we focus on optimizing PR, we may still see improvement in LT, but it can also lead to only an increase in inventory. The PR optimization may increase Q that is undeliverable because of some bottleneck in our system so the Q sits as inventory. The longer a feature sits in inventory the more it costs to move it through the pipeline and address any issues found in later stages of the pipeline.
So, to make sure we are optimizing for LT we focus on reducing waste or inventory in the pipeline. The delivery team keeps a single purposed focused on a limited amount of work in progress to deliver what the customer needs right now, based on priority in the backlog. Reducing inventory reduces Operation Expense. (Excuse me if I am allowing some lean thinking into this TOC explanation)
Investment (I) is the total cost invested in the pipeline. In our case we will count this as hours invested.
Operating expense (OE) is the cost of taking an idea and developing it to a deliverable. Any fixed overhead is considered OE. We will just use salaries of not only developers, but BA, QA, IT as our OE. Not sure how we will divide up our fixed salaries (still learning).
Throughput (T) is the amount earned per Q. It is calculated by taking the amount of features delivered to production minus the cost of delivering the feature.
To maximize ROI and net profit (NP) we need to increase T while decreasing I and OE.
NP = (T – OE)
ROI = NP/ I
Average Cost Per Feature
Average cost per feature (ACPF) is the average amount spent in the pipeline to create a feature.
ACPF = OE/Q
There are more metrics that we can gather, monitor, and analyze; but we will keep it simple for now and learn to crawl first.
Average Lead Time Per Feature
The average time it takes to move a feature from the backlog to production. We also calculate the standard deviation to get a sense on how varying work sizes in the pipeline affects lead time.
Bonus: Estimating Becomes Easier
When we begin to monitor our pipeline with these metrics estimating becomes simpler. Instead of estimating based on time we switch to estimating based on size of feature. Since we are tracking work, we have a history to base our future size estimates on.
Issues in Transformation
Our current Q is a release, a group of features that have been grouped together for a deployment. We will build up an inventory of features over a month at times before they are delivered to production. This causes an increase in inventory. It would be better to use a feature instead of a release as our Q. When a feature is ready, deliver it. This reduces inventory and increase the speed at which we get feedback.
To change our unit, Q, to feature we have to attack our largest constraint, QA. Currently, we have to sit on features or build up inventory to get enough to justify a QA test cycle. We don’t want to force a two week regression on one feature that took a couple days to complete. So, reducing the test cycle is paramount with this approach.
The Goal: A Process of Ongoing Improvement, by Eliyahu M. Goldratt
The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win, by Gene Kim, Kevin Behr, and George Spafford.
The Metrics in TOC: Productivity Metrics In Software Development, by Derick Bailey, https://lostechies.com/wp-content/uploads/2011/04/TheoryOfConstraints-ProductivityMetricsInSoftwareDevelopment.pdf
Agile Management for Software Engineering, by David J. Anderson
Reaching The Goal, by John Arthur Ricketts
Applying Theory of Constraints to Manage Bottlenecks, by Kamran Khan, http://www.isixsigma.com/methodology/theory-of-constraints/applying-theory-constraints-manage-bottlenecks/