Category: Quality

An Agile Transformation

I wrote this a few years ago, but I’m going through a similar agile transformation right now. Although, every agile transformation is different, this still makes sense to me although it is just a draft post. I figured I’d just post it because I never search my drafts for nuggets of knowledge :).

If we are going to do Kanban we shouldn’t waste time formally planning sprints. Just like we don’t want to do huge upfront specifications because of waste caused by unknowns that invalidate specs, we don’t want to spend time planning a sprint because the work being done in the sprint can change anytime the customer wants to reprioritize.

We should have a backlog of prioritized features. The backlog is regularly prioritized (daily, weekly…) to keep features available to work. If we want to deliver a specific set of features or features in two weeks, prioritize them and the team will do those features next.

There is a limit on the number of features the team can have in progress (work in progress or WIP). Features are considered WIP until they pass UAT. Production would be a better target, but saying a feature is WIP until production is a little far fetched if you aren’t practicing “real” continuous delivery. So, for our system, production is considered passing UAT. When the team is under their WIP limit they are free to pull the next feature from highest priority features in the backlog.

This is going to most likely reduce resource utilization, but will increase throughput and improve quality. Managers may take issue at developers not being used at full capacity, but there is a reason for this madness and hopefully I can explain it.

Having features pulled into the pipeline from a prioritized backlog instead planning a sprint allows decisions on what features to be worked to be deferred until the last possible moment. This provides more agility in the flow of work in the pipeline and the product owner is able to respond quickly to optimize the product in production. Isn’t agile what we’re going for?

Pulling work with WIP limits also gives greater risk management. Since batch sizes are smaller, problems will only affect a limited amount of work in progress and risk can be mitigated as new work is introduced in the pipeline. This is especially true if we increase the number of production releases. If every change results in a production release we don’t have to worry about the branch and hotfix dance.

Focusing on a limited amount of work improves the speed at which work is done. There is no context switching and there is a single focus on moving a one or limited amount work items through the system at one time. This increases the flow of work even though there may be times when a developer is idle.

The truth is the system can only flow as fast as its slowest link, the constraint. Having one part of the system run at full capacity and overload the constraint introduces a lot of potential waste in the system. If the idle parts of the system worked to help the bottlenecked part of the system, the entire system improves. So having a full system focus is important.

On my current team, we have constraints that determine how quickly we can turn around a feature. Currently, code review and QA are constraints. QA is the largest constraint that limits faster deployment cycles, but more on that later. To optimize our constraints we could follow the five basic steps outlined in the Theory of Constraints (TOC) from the book The Goal:

  1. Identify the constraint(s) – in this instance it’s code review and manual testing
  2. Exploit the constraint to maximize productivity – focus on improvements on the constraint
  3. Subordinate all other steps or processes to speed up or reduce capacity of the constraint – no new work may enter as WIP until the constraint has WIP available
  4. Elevate the constraint – prioritize work that helps remove the constraint.
  5. Repeat

To help with the code review constraint the plan is to have developers do code reviews any time WIP stops the movement of work. With this time developers can dig in and do more thoughtful code reviews and look for ways to refactor and improve the code base. Since we are touching code, why not make recommendations to make the code better. So, we can improve what an acceptable pull request is: good syntax, style, logic, tests… everything we can think of to make the codebase more maintainable and easy to validate.

To remove the QA constraint, the plan focuses on developers creating automated tests to help lessen the work that QA has to do. The reason we don’t first focus on optimizing QA processes directly is because focusing on simply optimizing QA processes would actually increases the capacity for QA without increasing the speed at which we can flow work to production. We don’t want to increase the number of features that QA can handle because it is important to take the proper time in testing. What we want to do is remove manual regression checks for QA. Exploiting QA for us means increasing QAs effectiveness freeing up time to do actual testing instead of just following a regression script. Having developers automate regression opens us up to deliver new features to production faster because automation runs these test much faster than QA. QA can focus on what they do best, testing and not running mundane scripted checks. Trick here is how do we convince developers to write automated tests without causing a revolt.

In summary, we would have to wait for a manual regression test cycle to occur and couldn’t introduce new work because it would invalidate the regression test. With automation handling +80% of regression QA can move faster, actually test more, and we can not only increase throughput through the entire system, but the overall quality of the product is also increased.

Monitoring Delivery Pipeline

We track work through the delivery pipeline as features. A feature in this sense is any change, new function, change existing function, or to fix a defect. Features are requested on features kept in a central database. We monitor the delivery pipeline by measuring:

  • Inventory
  • Lead Time
  • Quantity: Unit of Production
  • Production Rate


Inventory (V) is any work that has not been delivered to the customer. This is the same as work in progress (WIP). This counts all work from the backlog to a release awaiting production deployment. Whenever there is undelivered work and we have to cancel the work for some reason, we considered it an Operational Expense. Canceled work won’t be delivered to production because of defect, incorrect specs, the customer pivoted or otherwise doesn’t want it. Cancelled work is wasted effort and in some cases can also cause expensive un-budgeted rework. In traditional cost accounting inventory is seen as an asset, but in TOC it is a potential Operational Expense if it is not eventually delivered to customer so turning inventory as fast as possible without injecting defects is a goal.


Quantity (Q) is the total number of units that have moved through our delivery pipeline. Our unit of production is a feature. When a feature is deployed to production we can increase quantity by one unit. A feature is still considered inventory until it has been delivered to the customer in production. If a customer decides they don’t want the feature or some other reason to stop the deployment of the feature, it is counted as an Operational Expense and not quantity.

Flow Time

Flow time (FT) is the time it takes to move a feature, one unit, from submission to the backlog to deployed to a customer in production.

Production Rate

Production rate (PR) is the number of units delivered during a time period. This is the same as throughput. If we we deliver 3 features to production in a month our production rate is 3 features per month.

Optimize Delivery Pipeline for Flow Time

We should strive to optimize the delivery pipeline for flow time instead of production rate or throughput. The Theory Of Constraints – Productivity Metrics in Software Development posted on explains this well.

Let’s say our current flow time (FT) is 1 unit (Q) in a week or a production rate (PR) of 4 Q per month. If we optimize FT to 1 Q in 3 days, we will see a jump in PR to 6.67 Q per month or a 59% increase.

If we focus on optimizing PR, we may still see improvement in FT, but it can also lead to only an increase in inventory as WIP increases. The PR optimization may increase Q that is undeliverable because of some bottleneck in our system so the Q sits as inventory, ironically in a queue. The longer a feature sits in inventory the more it costs to move it through the pipeline and address any issues found in later stages of the pipeline. So, old inventory can also cause delay down stream as the team must take time to ramp up to address issues after they have moved on to another task.

So, to make sure we are optimizing for FT we focus on reducing waste or inventory in the pipeline by reducing WIP. The delivery team keeps a single purposed focused on one unit or a limited amount of work in progress to deliver what the customer needs right now, based on priority in the backlog. Reducing inventory reduces Operation Expense. (Excuse me if I am allowing some lean thinking into this TOC explanation)



Investment (I) is the total cost invested in the pipeline. In our case we will count this as time invested. We can sum the time invested on each unit in inventory in the pipeline to see how much is invested in WIP. We could count hours in timecards to determine this, but time cards are an evil construct. If we are good about moving cards, or even automated movement of cards based on some event (branch created, PR submitted, PR approved…), we could assign the time a card sits in some state to a standard investment amount in the time it sat. I’m still pondering this, but I feel like time investment based on card movement is way better than logging time.

Operating Expense

Operating expense (OE) is the cost of taking an idea and developing it to a deliverable. This is not to be confused with operational expense which is a loss in inventory or loss in investment. Any expense, variable or fixed, that is a cost to deliver a unit is considered OE. We will just use salaries of not only developers, but BA, QA, IT as our OE. Not sure how we will divide up our fixed salaries, maybe a function that includes time and investment. Investment would be a fraction of OE because all of a developers time is not invested in delivering features (still learning).


Throughput (T) in this sense is the amount earned per unit. Traditionally, this is that same as production rate as explained earlier, but in terms of cost, we calculate throughput by taking the amount earned on production rate, features delivered to production, minus the cost of delivering the features or the investment.

Throughput Accounting

To maximize ROI and net profit (NP) we need to increase T while decreasing I and OE.

NP = (T – OE)


Average Cost Per Feature

Average cost per feature (ACPF) is the average amount spent in the pipeline to create a feature.


There are more metrics that we can gather, monitor, and analyze; but we will keep it simple for now and learn to crawl first.

Average Lead Time Per Feature

The average time it takes to move a feature from the backlog to production. We also calculate the standard deviation to get a sense on how varying work sizes in the pipeline affects lead time.

Bonus: Estimating Becomes Easier

When we begin to monitor our pipeline with these metrics estimating becomes simpler. Instead of estimating based on time we switch to estimating based on size of feature. Since we are tracking work, we have a history to base our future size estimates on.

Issues in Transformation

Our current Q is a release, a group of features that have been grouped together for a deployment. We will build up an inventory of features over a month at times before they are delivered to production. This causes an increase in inventory. It would be better to use a feature instead of a release as our Q. When a feature is ready, deliver it. This reduces inventory and increase the speed at which we get feedback.

To change our unit, Q, to feature we have to attack our largest constraint, QA. Currently, we have to sit on features or build up inventory to get enough to justify a QA test cycle. We don’t want to force a two week regression on one feature that took a couple days to complete. So, reducing the test cycle is paramount with this approach.


  • The Goal: A Process of Ongoing Improvement, by Eliyahu M. Goldratt

Adding Report to Existing TFS 2017 Project

I had an issue where I couldn’t see reports for my TFS projects because they weren’t installed. I knew this because I opened SQL Reporting Services and I didn’t see a folder for my project under the TFS collection’s folder. I did a little digging and found a command that I could run to install the reports:

  1. Open administrator command prompt on server hosting TFS.
  2. Change directory to C:\Program Files\Microsoft Team Foundation Server 15.0\Tools
    Note: 64bit would be Program Files (x86)
  3. Run TFSConfig command to add project reports

TFSConfig addprojectreports /collection:”https://{TFSServerName}/{TFSCollectionName}” /teamproject:{TFSProjectName} /template:”Scrum”

You should replace the tokens with names that fit your context (remove the brackets). The template will be the template for your project:

  • Scrum – you will have backlog items under features
  • Agile – you will have stories under features

There’s another one, CMMI, but I’ve never used it. You should see a requirements work item, but I’m not sure if this template has a feature item.

Once you run the command, the reports will be added and you will be able to see how your team is doing by viewing the reports in SQL Reporting Services.

Testing Liskov Substitution Principle

In my previous post I talked about Liskov Substitution Principle in relation to TypeScript. I thought I would continue on my thoughts on LSP by defining it in terms of testing since testing has been a large part of my world for the past two years.

Here is another definition of LSP

Let q(x) be a property provable about objects x of type T. Then q(y) should be provable for objects y of type S where S is a subtype of T.

That’s how two titans of computer science Barbara Liskov and Jannette Wing defined it in 1993. Unfortunately, when I first learned of LSP through this definition the meaning of LSP alluded me because I was light on this strange genius speak. It wasn’t until I eventually stumbled on to seeing “provable” as some observable behavior or outcome that the definition of LSP clicked for me.

If I have an interface and I create an implementation of the interface, I can prove that the implementation is correct if it does what I expect it to. That expectation is represented by q in the equation above. Then if I create another implementation of the same interface, the same expectation or q should hold true with this new type. Hey, q is a test… duh.

Testing LSP

I have an interface that can be implemented in a type that can be used to accesses source code repositories. One property of this interface that I expect is that I can get a list of all of the tags in a repository returned in a string array.

IRepository {
string[] GetBranches();
string[] GetTags();

So, I create an implementation that can connect to a Git repository and it returns a string array of tags in the repository. I hook up the implementation to a UI and my client is happy because they can see all of their tags in my awesome UI on their mobile phone.

Now, they want an implementation for their SVN repository. No problem. I do another implementation and I return a string array of tags from their SVN repository. All good, expectation met, no LSP violations. I know this not because I am a genius and can do a mathematical proof, but because I wrote a functional test to prove the behavior by asserting what I expected to see in the string array (a mathematical proof at a higher abstraction for non-geniuses). When I run the test the tags returned match my expectation. With my test passing, I follow another SOLID principle, Dependency Inversion Principle (DIP), and I easily hook this up to the UI with a loose coupling. Anyway, now my client can open the UI and see a list of tags for their Git and SVN repositories. As far as they are concerned the expectation (q) is correct. My implementations satisfy the proof and my client doesn’t call an LSP violation on me.

My client says they now want to see a list of tags in their Perforce repository and I assign this to another dev team because this is boring to me now :). The team misunderstood the spec because I didn’t adequately define what a tag is for q. So, instead of returning tags in an array of strings they return a list of labels. While it is true that every tag in Perforce is a label, every label isn’t a tag. What’s even worse is the team has passing functional test that says they satisfied q. On top of this we didn’t properly QA the implementation to determine if their tests or definition of q is correct and we delivered the change to production. The client opens the UI and expects to see a list of tags from their Perforce repository and they see all the labels instead. They immediately call the LSP cops on us. This new type implementation of the interface does not meet expectation and is a violation of LSP.

Context is Key

Yes, this is a naive example of LSP, but it is how I understand it and how I apply it. If I have expectations when using an interface, abstract type, or implementation of some supertype, then every implementation or subtype should meet the expectation and be provable by the same expectation. The proof can be expressed as a mathematical equation, unit test, UI test, or visual observation as long as the expectation is properly expressed.


The point is, in order to not violate LSP we have to first have a shared understanding of the expectations expressed in our test (q). In our example, the development team had one expectation that wasn’t shared by the client and LSP was violated. To not violate LSP we have to understand how objects are expected to work, then we can define checks and tests to validate that LSP wasn’t violated.

This goes beyond just checking sub types in traditional object oriented inheritance. Every object that we create is an abstraction of something. If we create a People object, we expect it to have certain properties and behaviors. Even if we have a People type that won’t be sub typed, it can be said that it is a sub type of an actual person. The expectations that we define for People object should hold true. We expect a real person to have a name, address, and age. We could have a paper form (a People form) that we use to capture this information and the expectations are valid for the form. One day we decide to automate the form and we create an abstract People type and when we create an object of this type we expect it to have a name address, and age. We can test this object and determine if our People object violates LSP because it is a sub type of the manual form and we use the same expectations for the form and for our new object.

Now this is a little abstract mumbo jumbo, but it is a tenant that I believe is very important in software development. Don’t violate LSP!


Graphical Test Plan

I read a little about graphical test planning created by Hardeep Sharma and championed by David Bradley, both from Citrix. It’s a novel idea and sort of similar to the mind map test planning I have played around with. The difference is your not capturing features or various heuristics and test strategies in a mind map, you are mapping expected behavior only. Then you derive a test plan from the graphical understanding of the expected behavior of the system. I don’t know a lot about GTP, so this is a very watered down explanation. I won’t attempt to explain it, but you can read all about it:

Plan Business Driven Development with GTP

What interested me was the fact that I could abstract how we currently spec features into a GTP type model. I know the point of GTP is not to model features, but our specs model behavior and they happen to be captured in feature files. Its classic Behavior Driven Development (BDD) with Gherkin. We have a feature that defines some aspect of value that the system is expected to provide to users. In the feature we have various scenarios that describe the expected behaviors of the feature. Scenarios have steps that define pre-condition, action, and expectations (PAE) or in Gherkin, Given-When-Then (GWT) that define how a user would execute the scenario. We also have feature backgrounds which is a feature wide pre-condition that is shared by all scenarios in the feature.
I said we use Gherkin, but our new test runner transcends just GWT. We can define PAE in plain English without the GWT constraints, we can select the terms to describe PAE instead of being forced to use GWT which sometimes causes us to jump through hoops to force the GWT wording to sound correct. 

GTP Diagram

If we applied something like GTP we would model the scenarios, but there would be more hierarchy before we define the executable scenarios. We currently use tagging to group similar scenarios that exercise a specific subset of a feature’s scenarios. This allows us to provide faster feedback by running checks for just a subset instead of the entire feature when we are only concerned with changes to the subset. In a GTP’ish model the left most portion of the diagram would hold generalized behavior specs, similar to how we use tagging, and as we go to the right the behavior becomes more granular until we hit a demarcation point for executable scenarios that can then be expressed in a linked test case diagram (TCD). In the GTP there are ways to capture meta data like related requirement/ticket ID for traceability back to requirements. Also, meta for demarcation point (can’t think of better name) to link to the TCD or feature file that further defines it.

Test Case Diagram

The test case diagram would define various scenarios that define the behavior of the demarcation points in the GTP. The TCD diagram would also include background preconditions and the steps to execute the scenario. At this point it feels like this is an extra step. We have to write the TCD in a feature file so diagramming it is creating a redundant document that has to be maintained.
In the TCD there are shapes for behavior, preconditions, steps, and expectations. I think there should be additional shapes or meta to express tags because this is important in how we categorize and control running of scenarios. It may help if there is also meta to link back to the GTP that the TCD is derived from so we can flow back and forth between the diagrams. Meta in the TCD is important because it gives us the ability extract understanding outside of just the test plan and design. We could have shapes, meta descriptions and links to
  • execute automated checks
  • open a manual exploratory test tool
  • view current test state (pass/fail)
  • view historical data (how many times has this step failed, when was the last failure of this scenario…)
  • view flake analysis or score
  • view delivery pipeline related to an execution
  • view team members responsible for plan, develop, test and release
  • view related requirement or ticket
  • much more…

Since we also define manual tests by just tagging features or scenarios with a manual tag or creating exploratory test based feature files, we could do this for both automated checks and manual tests.

GTP-BDD Binding

To get rid of the TCD redundancy we could generate the feature file from the diagram or vice-versa. Being able to bind GTP to BDD would make GTP more valuable to me.
We would need an abstract object graph that could be used to generate both the diagram and the feature file (Excel spread sheet, HTML page or whatever else). We are almost here, we have a tool that can generate feature files from persisted objects and vice versa. We would just have to figure out how to generate the diagram and express it as an interactive UI and not just a static picture.
What we have been struggling with is the ability to manually edit feature files and keep that in sync with the persisted objects. With a centralized UI this is easy because everyone uses the UI to update the objects. When people are updating features files from a source code repository we have to worry about merge conflicts (yuck) and if we consider the feature file or the persisted object as the source of truth. So, we may have to reduce flexibility and force everyone to use the UI only. Everyone would have to have discipline and not touch the feature files even though we have nice tools built into our IDE to help write and manage them. The tool would have to detect when someone has violated the policy and so on…I digress.


With a graphical UI modeled on GTP/TCD to manage BDD we can provide an arguably simpler way to visualize tests and provide the ability to drill down to see different aspects of test plans and designs and their related current and historical execution. With 2-way binding from diagram to feature file we have a new way to manage our executable specifications. This model could provide a powerful tool to not only aide test planning, but test management as a whole. The end result would hopefully be a better understanding for the team, increased flow in delivery pipeline, enhanced feedback, and more value to the customer and the business.
Now lets ask Google if something like this already exists so I don’t have to add it to my ever increasing backlog of things I want to build. Thanks to Hardeep Sharma, David Bradley, and Citrix for sharing GTP.

Extending the Reach of QA to Production

I have multiple lingering tasks for improving monitoring for for our applications. I believe this is a very important step we need to take to assess the quality of our applications and measure the value that we are delivering to customers. If I had my way, I would hire another me just so I can concentrate on this.


We need to monitor usage to better understand how our customer actually use the application in production. This will allow us to make better product design decisions and optimizations, prioritize testing effort in terms of regression coverage, and provide a signal for potential issues when trends are off.


We need a better way to monitor and analyze errors. We currently get an email when certain exceptions occur. We also log exceptions to a database. What we don’t have is a way to analyze exceptions. How often do they occur, what is the most thrown type of exception, what was system health when the exception was thrown.


We need a way to monitor and be alerted of health issues (e.g. current utilization of memory, cpu, diskspace; open sessions; processing throughput…). Ops has a good handle on monitoring, but we need to be able to surface more health data and make it available outside of the private Ops monitoring systems. It’s the old “it takes a village to raise an app” thing being touted by the DevOps movement.


Everyone on the delivery team needs some access to a dashboard where they can see the usability, exceptions, health of the app and create and subscribe to alerts for various condition thresholds that interest them. This should be even shared with certain people outside of delivery just to keep things transparent.


This can all be started in preproduction and once we are comfortable with it pushed to production. The point of having it is that QA is a responsibility of the entire team. Having these types of insight into production is necessary to insure that our customers are getting the quality they signed up for. When the entire team can monitoring production it allows us to extend QA because we can be proactive and not just reactive to issues in production. Monitoring production gives us the ammo we need to take preemptive action to avert issues in production while giving us the data we need to improve the application.

Monitoring Change Tickets in Delivery Pipelines

DevOps sounds cool like some covert special operations IT combat team, but it is missing the boat in many implementations because it only focuses on the relationship between Dev and Ops and is usually only championed by Ops. The name alienates important contributors on the software delivery team. The team is responsible for software delivery including analysis, design, development, build, test, deploy, monitoring, and support. The entire team needs to be included in DevOps and needs visibility in to delivery pipelines from end-to-end. This is an unrelated rant, but this lead me to thinking about how a delivery team can monitor changes in delivery pipelines.

Monitor Change

I believe it is important that the entire team be able to monitor changes as they flow through delivery pipelines.. There are ticket management systems that help capture some of the various stages that a change goes through, but its mostly various project management related workflow stages and they have to be changed manually. I’d like a way to automatically monitor a change as if flows from change request all the way to production and monitor actions that take place outside of the ticket or project management system.

Normally, change is captured in some type of ticket maybe in a project management system or bug database (e.g. Jira, Bugzilla). We should be able to track various activities that take place as tickets make their way to production. We need a way trace various actions on a change request back to the change request ticket. I’d like a system where activities involved in getting a ticket to production automatically generate events that are related to ticket numbers and stored in a central repository.

If a ticket is created in Jira, a ticket created event is created. A developer logs time on a ticket, a time logged activity event is created that links back to the time log or maybe holds data from the time log for the ticket number.

When an automated build that includes the ticket happens, then a build stated activity event is created with the build data is triggered. As various jobs and tasks happen in the automated build a build changed activity event is triggered with log data for the activity. When the build completes a build finished activity event is triggered. There may be more than one ticket involved in a build so there would be multiple events with similar data captured, but hopefully changes are small and constrained to one or a few tickets… that’s the goal right, small batches failing fast and early.

We may want to capture the build events and include every ticket involved instead of relating the event directly to the ticket, not sure; I am brainstorming here. The point is I want full traceability across my software delivery pipelines from change request to production and I’d like these events stored in a distributed event store that I can project reports from. Does this already exists? Who knows, but I felt like thinking about it a little before I search for it.

Ticket Events

  1. Ticket Created Event
  2. Ticket Activity Event
  3. Ticket Completed Event

A ticket event will always include the ticket number and a date time stamp for the event, think Event Sourcing. Ticket created occurs after the ticket is created in the ticket system. Ticket completed occurs once the ticket is closed in the ticket system. The ticket activities are captured based on the activities that are configured in the event system.

Ticket Activity Events

A ticket activity is an action that occurs on a change request ticket as it makes its way to production. Ticket activities will have an event for started, changed, and finished. Ticket activity events can include relevant data associated with the event for the particular type of activity. There may be other statuses included in each of these ticket activity events. For example a finish event could include a status of error or failed to indicate that the activity finished but it had an error or failed.

  • {Ticket Activity} Started
  • {Ticket Activity} Changed
  • {Ticket Activity} Finished

Deploy Started that has deploy log, Build Finished that has the build log, Test Changed that has new test results from an ongoing test run.

Maybe this is overkill? Maybe this should be simplified where we only need one activity event per activity and it includes data for started, changed, finished, and other statuses like error and fail. I guess it depends on if we want to stream activity event statuses or ship them in bulk when an activity completes; again I’m brainstorming.


Every ticket won’t have ticket activity events triggered for every activity that the system can capture. Tickets may not include every event that can occur on a ticket. Activity events are triggered on a ticket when the ticket matches the scope of the activity. Scope is determined by the delivery team.

Below are some of the types of activity events that I could see modeling for events on my project, but there can be different types depending on the team. So, ticket activity events have to be configurable. Every team has to be able to add and remove the types of ticket activity events they want to capture.

  1. Analysis
    1. Business Analysis
    2. Design Analysis
      1. User Experience
      2. Architecture
    3. Technical Analysis
      1. Development
      2. DBA
      3. Build
      4. Infrastructure
    4. Risk Analysis
      1. Quality
      2. Security
      3. Legal
  2. Design
  3. Development
  4. Build
  5. Test
    1. Unit
    2. Integration
    3. End-to-end
    4. Performance
    5. Scalability
    6. Load
    7. Stress
  6. Deploy
  7. Monitor
  8. Maintain

Reporting and Dashboards

Once we have the events captured we can make various projections to create reports and dashboards to monitor and analyze our delivery pipelines. With the ticket event data we can also create reports at other scopes. Say we want to report on a particular sprint or project. With the ticket Id we should be able to gather this and relate other tickets in the same project or sprint. It would take some though as to whether we would want to capture project and sprint in the event data or leave this until the time when we make the actual projection, but with ticket Id we can expand our scope of understanding and traceability.


The main goal with this exploration into my thoughts on a possible application is to explore a way to monitor change as it flows through our delivery pipelines. We need a system that can capture the raw data for ticket create and completed events and all of the configured ticket activity events that occur in between. As I look for this app, I can refer to this to see if it meets what I envisioned or if there may be a need for this.

Video Recording C# WebDriver Tests in TestPipe

The title is a little misleading because you can use the technique below to do a screen capture of anything happening on the screen and not just WebDriver tests. Yet, TestPipe does use C# WebDriver out the box so we will be recording these types of tests.

So, we want to add video recording tests to TestPipe. At first I thought this would be very difficult, but after finding Microsoft Expression Encoder SDK it became a lot easier. I was even able to find other people that have used this SDK which made a decision to move forward with this a little easier to take on.

First, I read the Working with Screen Capture section of the Overview of the Expression Encoder SDK. From this I learned that I needed to create an instance of ScreenCaptureJob. The question is, where do I create it?

In TestPipe we have a ScenarioSession class that holds state of various things while a test scenario runs and it makes sense to expose this new functionality there because we want to be able to control video recording within the context of individual test scenarios. Do we add a new property on the session or should it be a new property on the IBrowser interface. We already have a TakeScreenshot method on IBrowser. Yet, I don’t think it is a good fit on the browser interface because there is a bit of setup that needs to take place for ScreenCaptureJob that is out of scope for a browser and I don’t want to muddy up the API more than it already is.

When we setup a scenario we want to allow setup of the ScreenCaptureJob based on configuration for a feature and/or a scenario. We define features and scenarios in a text file, currently using Gherkin, and we store data used in feature and scenario tests in a JSON file. So, we have to configure video recording in the Gherkin, JSON or both.

Do we keep all recordings or only failing recordings? What if we want to keep only failing, but from time to time we need non-failing recordings for some reason? Do we overwrite old recordings or store in unique folders or filenames?

To trigger the recording we could use tags. If an @Video tag is present on the scenario or feature, record the scenario(s) and only keep the recording if the scenario fails. If the @Debug tag is present on the Feature or the Scenario, keep the recordings even if they don’t fail.

We can create a unique folder for the recordings so that we can store videos of multiple runs of the same scenario. We may want to think about how we clean these up, but we may have enough file clean up in other processes. We will just have to watch hard drive space in production use.

So, we have a strategy to automatically configure recording. Now, we have to implement it in a way that also allows manual configuration just in case we want to hard wire video recording in a test.

So, I found our seam to make the changes for video recording. In our RunnerBase class we have methods to setup and teardown a scenario. It is there that we will make the change to configure, start, stop, and delete video recordings.

Now to implement. First I download the encoder from This will have to be installed on every server that will run tests so I create a Powershell script to install it. It would be nice to also do a Chocolatey package, but that is overkill for me because I am not using Chocolatey on my servers right not. You can create your own automated installer by extracting the setup file from the download then creating a Powershell script to run

setup.exe -q

to quietly install. I believe you can use the -x parameter to uninstall, but I didn’t test this yet. (Assuming msiexe command line options are used

With the encoder installed we have access to the DLLs that we need to work with. In VisualStudio I add a reference to the extensions for Microsoft.Expression.Encoder, Microsoft.Expression.Encoder.Api2, Microsoft.Expression.Encoder.Types, and Microsoft.Expression.Encoder.Utilities. Not sure if I need all these, but they were added by the installer so I will keep them for now.

From here I can add a using

using Microsoft.Expression.Encoder.ScreenCapture;

and implement recording based on the sample code, updating to fit TestPipe standards.

One caveat is the encoder outputs some kind of Microsoft proprietary video format xesc. I thought about collecting all the videos that are kept at the end of a test run and run some kind of parallel task to convert them to a more portable format. In the end, I just left it alone. This is a new feature and only my team will be looking at the videos and everyone has Windows Media Player that can play the format.

I won’t write more on implementation details because I am boring myself, but if you want to check it out you can view it on GitHub (RunnerBase is where we use the recorder and you should be able to figure out the rest). One interesting twist is we implemented Expression Encoder behind an interface so that it isn’t requirement to use TestPipe. If we didn’t do this, you wouldn’t be able to build or use TestPipe without first installing the dependent encoder.

So, TestPipe comes out the box with a dummy implementation of the interface that won’t actually do the recordings. If you want to capture actual recording you can use the TestPipe.MSVideoRecorder plug-in or implement the IVideoRecorder interface on another screen capture program to enable video recording of tests. Right now TestPipe.MSVideoRecorder, is included in the TestPipe solution, but it is not set to build automatically. When we make changes we set it to build and manually move the binary to the folder we have configured to hold the video recorder plug-ins. Eventually, we will move it to a separate repository and create a NuGet package, but I’m tired.


Overview of the Expression Encoder SDK –

Road to screen recording in webdriver with C# –

Record video of your Selenium Tests –

Quick Testing Legacy Web Services

If you still have legacy webservices, the old asmx file variety, and you need to do a quick test from a server that doesn’t have fancy testing tools. This article provided an easy way to test the service with just a browser and an HTML file.

Test Service GET Method

To test the service’s GET methods you can use a browser and a specially formatted URL.


For example, I have

  • a domain,
  • it hosts a service, oldservice.asmx
  • that has a GET method, GetOldData
  • that accepts parameters, ID and Name

The URL to test this web service method would be Old Data

This would return an XML file containing the response from the service or an error to troubleshoot.

Test Service POST Method

To test the service’s POST methods you can use a simple HTML file containing a form. Just open the form in your browser, enter the values, and submit.

<form method="POST" action="http://domain/service.asmx/method"><div><input type="text" name="parameter" /></div><div><input type="submit" value="method" /></div></form>

For example, I have

  • a domain,
  • it hosts a service, oldservice.asmx
  • that has a Post method, SaveOldData
  • that accepts parameters, ID and Name

The HTML form to test this web service method would be

<form method=”POST” action=””><div>ID: <input type=”text” name=”ID” /></div><div>Name: <input type=”text” name=”Name” /></div><div><input type=”submit” value=”SaveOldData”></div></form>

This would return an XML file containing the response from the service or an error to troubleshoot.


If you get a System.Net.WebException error message that indicates the request format is unrecognized, you need to do some configuration to get it to work as explained in this KB. Just add this to the system.web node in the web.config of the web service and you should be good to go.

<webServices> <protocols> <add name="HttpGet"/> <add name="HttpPost"/> </protocols> </webServices>


If you are sentenced to maintaining and testing legacy web services, these simple tests can help uncover pesky connectivity, data and other issues that don’t return proper exceptions or errors because your app is old and dumb (even if you wrote it).

Everyone’s a Risk Analyst

I watched a video about software security and it had me thinking about risk. So, I thought I would write a quick blog post about some of my thoughts. This is a personal opinion post and rant about team responsibility in revealing risk.

Revealing Risks

As a member of a software delivery team one of my many responsibilities is to reveal risk in the application before its released. I wasn’t asked to specifically reveal risk. Actually, as part of my current position I was asked to write automated tests that prove the application works as specified by the business. Well, I do that, but the business really wants know the risk in shipping a release. If we ship, will it work, will there be profit sucking bugs, reputation destroying issues…? Can we trust that we can push the big red deploy button without the release blowing up and hurting instead of helping the business and our customers?

I understand that I can not reveal all risk, but I try to reveal the most damaging risks. There is no way we can uncover all risks, we can never be 100% certain that we are risk free, but there is value in every team member searching for risk, revealing risks, verifying the most damaging risks are mitigated, and providing information to the team to evaluate the risk of a release.

Even though I write automated tests, I cannot in good faith rely solely upon automated tests to reveal risks. In order to write automated test I have to know how to run scenarios manually. If I am going to run scenarios manually, I should also explore the application for risks outside of the specification checks I am automating. My true value to the business is realized when I am able to explore the application and observe its behavior with my own eyes in order to identify risks not covered by requirements and specifications. Automation is good, but it can only catch known risks. So, I manually test the application and manual testing will never go away no matter how much automated coverage I achieve.


If I am working as a feature developer, I have the most responsibility for catching risks. If my code doesn’t work, I have to fix it. So, why not invest time to make sure the code works before I send it down the pipeline. If I have to wait for feedback from someone else down the delivery pipeline, it becomes harder to switch context and remember the details of the change. Testing as I code gives me the fastest feedback and the least amount of context switching.

Also, its cheaper to fix an issue as I am developing it than later after someone else has spent time testing it. So, having testing as part of my development workflow reduces cost. I further reduce the cost of software delivery by automating the tests that I use to check my work. Then it becomes faster to rerun the tests and the tests can be leveraged in an automated build to check for regressions.

When I believe I am ready to commit, after I have manually explored the effect of my change and have identified, triaged, and mitigated high priority risks, after I have automated my specification checks, then I can commit my work. When I commit I am saying that I have exercised due diligence in revealing risks in the code I deliver.

Business Analyst

As a Business Analyst, I can prevent risks by not introducing them in specifications. I can reduce risks by involving users, developers, QA, and business stakeholders early in my analysis of changes to the application. Even though I provide awesome specifications, I am engaged during development and available for testing at all points in the SDLC. My specifications are only as good as I understand the application and understanding comes from usage and interrogation of others that understand the application and how it will be used. If I am going to explore the application to help write specifications, I am going to explore the application outside of the specifications to help my team uncover hidden risks.

Quality Analyst

As a Quality Analyst, I am the last line of defense before a release is given to our users. Even though quality is in my title, I am not solely responsible for quality since quality is a large component of the risk analysis my team does. I reduce risks with my talent for exposing quality related risks. If I didn’t have to deal with shortcomings in specifications and the application’s development, I could take more time to freely explore the system and uncover risks. Many times my testing amounts to activities that could have been automated like specification checks and regression tests. I am the professional risk analyst on the team, but because quality is in my title, on many teams I have been reduced to a human checker instead of a professional risk analyst.

Automation Engineer

I am a developer who recently switched my title to automation engineer. I am really a QA that writes a lot of code. I have always had an extremely high regard for QA. As an independent contractor, on many of my contracts I didn’t have the luxury of a QA and it hurt. When I first worked with a good QA, my development world changed. I learned how to test my work just by watching them and reading their bug reports. I’d say that my time spent with a couple of these world class QAs was worth more than anything I have learned from other developers. Now that I see first hand a little of what they do, I have even more respect.

When I identify new risks, it is up to the product team to categorize it as a true risk, bug, defect…or as something we can ignore. My value to the team is not identifying bugs or rejecting tickets, but providing information on the risks I have identified in the application. If I have done my work, I would also provide supporting evidence that helps others to observe the risks I identified.

If you hold QA accountable for bugs they did not add to the system, you don’t understand the role of QA. If bugs escape to production, it is not QAs fault, it is the teams fault. You can’t place blame on one person or role. Everyone, Dev, BA, QA, Product Manager…etc should be included in the pre-release hunt for bugs. If a bug gets by the team it is the team’s fault.


I have come to the conclusion that I need to reveal risk from my experience as an entrepreneur and developer. I strengthened my belief in this idea from studying “Titans of Test” like James Whittaker, Michael Bolton, Cem Kaner, and James Bach. If you are on a software delivery team, you are a risk analyst. In agile teams everyone is considered a developer. The titles are gone and the team shares in all responsibilities. There may be people that specialize in certain activities like writing code, specs or tests, but everyone should be involved in all aspects of delivering the product. There may be people on your team with test or quality in their title or job description, but everyone on the team is responsible for the risk and therefore quality of the applications. So, if you are involved in software delivery, get in touch with your inner tester and explore your application because quality is a team sport and you are a risk analyst.

GoCD: Agent Running Web Driver Test Hangs After Test Failure [SOLVED]


I had a nagging issue were some tests were failing the build on our GoCD server, but the agent was not reporting the failure to the server. We are using NAnt to run NUnit tests that in turn call Web Driver to exercise web pages. There were some test failures that correctly returned a non-zero value that failed the build in NAnt. Also, the failure is captured in the log and saved in a text file. Yet, the agent didn’t report the build failure or send the artifacts to the server.


After a 2 day search for answers and a deep dive into the bowels of GoCD I discovered that a Web Driver process was kept open after the test fails the build. Specifically, the process is IEDriverServer.exe. This process was being orphaned by improper cleanup in the tests that resulted in the Web Driver and browsers staying open after the test failure.

When I ran the tests again, I watched for the failure then manually killed Web Driver and the agent magically reported to the server. I am still unsure why Web Driver would prevent the GoCD agent from reporting to the server. They are both Java processes, maybe there is something going on in the JVM or something… not sure.


My work around at the moment is to run a task killer on failure in the test script. Here is the relevant portion of the nant script that drives the tests:

<property name="nant.onfailure" value="test.taskkiller" />
<target name="test.taskkiller">
 <exec program="taskkiller.bat" failonerror="false">

The taskkiller.bat is just a simple bat file that will kill Web Driver and open browsers.

taskkill /IM IEDriverServer.exe /F
taskkill /IM iexplore.exe /F

Now this is just a band-aid. We will be updating our test framework to handle this. Additionally, killing all the processes like this isn’t good if we happen to be running tests in parallel on the agent, which may be a possibility in the future.