Category: Quality

How Much Does Automated Test Maintenance Cost?

I saw this question on a forum and it made me pause for a second to think about it. The quick answer is it varies. The sarcastic answer is it costs as much as you spend on it, or how about, it cost as much as you didn’t spend on creating a maintainable automation project.

I have only been involved in 2 other test automation projects prior to my current position. In both I also had feature development responsibility. On one of the projects, comparing against time developing features, I spent about 10-15% of my time maintaining tests and about 25% writing them. So, that is about 30-40% of my total test time on maintenance. Based on my knowledge today, some of my past tests weren’t that good so maybe the numbers should have been higher or lower. On the other project, test maintenance was closer to 50% and that was because of poor tool choice. I can state the numbers because I tracked my time spent. I could not use these as benchmarks to estimate maintenance cost on my current project or any other unless the context was very similar and I can easily draw the comparison.

I have seen where someone might say “it’s typically between this and that percentage of development cost,” or something similar. Trying to quantify maintenance costs is hard, very hard and it depends on the context. You can try to estimate based on someone else’s guess of a rough percentage and hope it pans out, but in the end it is dependent on execution and environment. An application that changes often vs. one that rarely changes, poorly written automated tests, bad choice of automation framework, skill of the automated tester…there is a lot that can change cost from project to project. I am curious if someone has a formula to calculate an estimate across all projects, but having an insane focus on the maintainability of your automated test suites can significantly reduce costs in the long run. So a better focus, IMHO, is on getting the best test architecture, tools, framework, people and make maintainability a high priority goal. Also properly tracking maintenance in the project management or bug tracking system can provide a more valuable measure of cost across the life of a project. If you properly track maintenance cost (time), you get a benchmark that is customized for your context. Trying to calculate cost up front with nothing to base the calculations on but a wild uneducated guess can lead to a false sense of security.

So, if you are trying to plan a new automation project and you ask me about cost the answer is, “The cost of having automated tests…priceless. The cost of maintaining automated tests…I have no idea.”

Bisecting Our Code Quality Pipeline

I want to implement gated check-ins, but it will be some time before I can restructure our process and tooling to accomplish it. What I really want is to be able to keep the source tree green and when it is red provide feedback to quickly get it green again. I want to run tests on every commit and give developers feedback on their failing commits before it pollutes the source tree. Unfortunately, to run the tests as we have it today would take too long to test on every commit. I came across a quick blog post by Ayende Rahien on Bisecting RavenDB and they had a solution were they used git bisect to find the culprit that failed a test. They gave no information on how it actually worked just a tease that they are doing it. I left a comment to see if they would share some of their secret sauce behind their solution, but until I get that response I wanted to ponder it for a moment.

Git Bisect

To speed up testing and also allow test failure culprit identification with git bisect we would need a custom test runner that can identify what test to run and run them. We don’t run tests on every commit, we run tests nightly against all the commits that occurred for the day. When the test fails it can be difficult identifying the culprit(s) that failed the test. This is were the Ayende steps in with his team’s idea to use bisect to help identity the culprit. Bisect works by traversing commits. It starts at the commit we mark as the last known good commit to the last commit that was included in the failing nightly test. As bisect iterates over the commits, it pauses at each commit and allows you to test it and mark if it is good or bad. In our case we could run a test against a single commit. If it passes, tell bisect its good and to move to the next. If it fails, save the commit and failing test(s) as a culprit, tell bisect its bad and to move to the next. This will result in a list of culprit commits and their failing tests that we can use for reporting and bashing over the head of the culprit owners (just kidding…not).

Custom Test Runner

The test runner has to be intelligent enough to run all of the tests that exercise the code included in a commit. The custom test runner has to look for testable code files in the commit change log, in our case .cs files. When it finds a code file it will identify the class in the code file and find the test that targets the class. We are assuming one class per code file and one unit test class per code file class. If this convention isn’t enforced, then some tests may be missed or we have to do a more complex search. Once all of the test classes are found for the commit’s code files, we run the the tests. If a test fails, we save the test name and maybe failure results, exception, stack trace… so it can be associated with the culprit commit. Once all of the tests are ran, if any of them failed, we mark the commit as a culprit. After the test and culprit identification is complete, we tell bisect to move to the next commit. As I said before, this will result in a list of culprits and failing test info that we can use in our feedback to the developers.

Make It Faster

We could make this fancy and look for the specific methods that were changed in the commit’s code file classes. We would then only find tests that test the methods that were changed. This would make testing focused like a lazer and even faster, but we could probably employ Roslyn to handle the code analysis to make finding tests easier. I suspect tools like ContinuousTests – MightyMoose do something like this, so it’s not that far fetched an idea, but definitely a mountain of things to think about.


Well this is just a thought, a thesis if you will, and if it works, it will open up all kind of possibilities to improve our Code Quality Pipeline. Thanks Ayende and please think about open sourcing that bisect.ps1 PowerShell script 🙂

TestPipe Test Automation Framework Release Party

Actually, you missed the party I had with myself when I unchecked private, clicked save on GitHub, and officially release TestPipe. You didn’t miss your chance to checkout TestPipe, a little Open Source project that has the goal of making automated browser based testing more maintainable for .NET’ters. The project source code is hosted on GitHub and the binaries are hosted on NuGet:



If you would like to become a TestPipe Plumber and contribute, I’ll invite you to the next party :).


Results of my personal make a logo in 10 minutes challenge. Image

Trust No One or a Strange Automated Test

Nullius in verba (Latin for “on the word of no one” or “Take nobody’s word for it”)

This is the motto for the Royal Society, UK’s Academy of Science. I bring this up because I inherited an automated test suite and I am in the process of clearing current errors and developing a maintenance plan for them. As I went through the test I questioned whether I could trust them.  In general its difficult to trust automated tests and its worse when I didn’t write them. Then I remembered “nullius in verba” and decided that although I will run these tests, fix them and maintain them, I can not trust them. In fact, since I am now responsible for all automated tests I can’t put any value in any test unless I watch them run, understand their purpose, and ascertain the validity of their assumptions. This is not to say that the people that write tests that I maintain cannot be trusted because of incompetence. In fact, many of the tests that I maintain have been crafted by highly skilled professionals. I just trust no one and want to see for myself.

Even after evaluating automated tests, I can’t really trust them because I don’t watch every automated test run. I can’t say for certain that they passed or failed or that they are a false positive. Since I don’t watch every test run I can only hope they are OK. I can’t even trust someone else’s manual testing with the infallibility of man, so I can’t trust an automated check written by an imperfect human. So, I view automated tests like manual test, they are tools in the evaluation of the software under test.

It would be impractical to manually run every test covered by the automated suite so a good set of tests provide more coverage than manual execution alone. One way automated tests provide value is when they uncover issues that point to interesting aspects of the system that warrant further investigation. Failing tests or unusually slow tests can give a marker to focus on in manual exploration of the software. This is only true if the tests are good, like being focused on one concept, not flaky or sometimes passing or failing, and other attributes of a good automated test. If the tests are bad, their failures may not be actual and take away all value from the automated test because I have to waste time instigating them. In fact, having an automated test suite plagued with bad tests can increase the effort required to maintain test so much that it negates any value they provide. The maintainability of a test is a primary criteria that I evaluate when I inherit them from someone else and I have to see for my self if each test is good and maintainable before I can place any value in them.

So, my current stance is to not trust anyone else’s test. Also, I do not elevate automated tests to being the de facto standard that the software works. Yet, I find value in the automated tests as another tool in my investigation of the quality of the software. If they don’t cost much in terms of maintenance or running them, they provide value in my evaluation of software quality.

Nullius in verba

Scientific Exploration and Software Testing

Test ideas by experiment and observation,

build on those ideas that pass the test,

reject the ones that fail.

Follow the evidence wherever it leads

and question everything.

Astronomer Neil deGrasse Tyson, Cosmos, 2014

This was part of the opening monologue to the relaunch of the Cosmos television series. It provides a nice interpretation of the scientific method, but also fits perfectly with one of my new roles as software tester. Neil finishes this statement with

Accept these terms and the cosmos is yours. Now come with me.

It could be said, “Accept these terms and success in software testing is yours.” What I have learned so far about software testing falls firmly in line with the scientific method. I know software testing isn’t as vast as exploring billions of galaxies, but with millions of different pathway through a computer program, software testing still requires similar rigor as any scientific exploration.

IE WebDriver Proxy Settings

I recently upgraded to the NuGet version of IE WebDriver (IEDriverServer.exe). I started noticing that when I ran my tests locally I could no longer browse the internet. I found myself having to go into internet settings to reset my proxy. My first thought was that the new patch I just received from corporate IT may have botched a rule for setting the browser proxy. After going through the dance of running tests, resetting proxy, I got pretty tired and finally came to the realization that it must be the driver and not IT.

First stop was to check Bing for tips on setting proxy for WebDriver. Found lots of great stuff for Java, but no help on .Net. Next, I stumbled upon a log message in the Selenium source change log that said, “Adding type-safe Proxy property to .NET InternetExplorerOptions class.” A quick browse of the source code and I had my solution.

In the code that creates the web driver I added a proxy class set to auto detect.

Proxy proxy = new Proxy();
proxy.IsAutoDetect = true;
proxy.Kind = ProxyKind.AutoDetect;

This sets up a new Proxy that is configured for auto detect. Next, I added 2 properties, Proxy and UsePerProcessProxy to the InternetExporerOptions

var options = new OpenQA.Selenium.IE.InternetExplorerOptions
     EnsureCleanSession = true,
     Proxy = proxy,
     UsePerProcessProxy = true

Proxy is set the the proxy we previously set up. UsePerProcessProxy tells the driver that we want this configuration to be set per process, NOT GLOBALLY, thank you. Shouldn’t this be the default, I’m just saying. EnsureCleanSession, clears the cache when the driver starts, this is not necessary for the Proxy config and is something I already had set.

Anyway, with this set up all we have to do is feed it to the driver.

var webDriver = new OpenQA.Selenium.IE.InternetExplorerDriver(options);

My test coding life is back to normal, for now.

Agile Browser Based Testing with SpecFlow

This may be a little misleading as you may think that I am going to give some sort of Scrumish methodology for browser based testing with SpecFlow. Actually, this is more about how I implemented a feature to make browser based testing more realistic in a CI build.

Browser based testing is slow, real slow. So, if you want to integrate this type of testing into your CI build process you need a way to make the tests run faster or it may add considerable time to the feedback loop given to developers. My current solution is to only run tests for the current sprint. To do this I use a mixture of SpecFlow and my own home grown test framework, TestPipe, to identify the tests I should run and ignore.

The solution I’m using at the moment centers on SpecFlow Tags. Actually, I have blogged about this before in my SpecFlow Tagging post. In this post I want to show a little more code that demonstrates how I accomplish it.

Common Scenario Setup

The first step is to use a common Scenario setup method. I add the setup as a static method to a class accessible by all Step classes.

public class CommonStep
 public static void SetupScenario()
 catch (CharlesBryant.TestPipe.Exceptions.IgnoreException ex)

TestPipe Runner Scenario Setup

The method calls the TestPipe Runner method SetupScenario that handles the Tag processing. If SetupScenario determines that the scenario should be ignored, it will throw the exception that is caught. We handle the exception by Asserting the the test ignored with the test frameworks ignore method (in this case NUnit). We also pass the ignore method the exception method as there are a few reasons why a test may be ignored and we want the reason included in our reporting.

SetupScenario includes this bit of code

if (IgnoreScenario(tags))
 throw new IgnoreException();

Configuring Tests to Run

This is similar to what I blogged about in the SpecFlow Tagging post, but I added a custom exception. Below are the interesting methods for the call stack walked by the SetupScenario method.

public static bool IgnoreScenario(string[] tags)
 if (tags == null)
 return false;
string runTags = GetAppConfigValue("test.scenarios");
runTags = runTags.Trim().ToLower();
return Ignore(tags, runTags);

This method gets the tags that we want to run from configuration. For each sprint there is a code branch and the app.config for tests in the branch will contain the tags for the tests we want to run for the sprint on the CI build. There is also a regression branch that will run all the tests which runs weekly. All of the feature files are kept together, so being able to tag specific scenarios in a feature file to run gives the ability to run specific tests for a sprint while keeping all of the features together.

Test Selection

Here is the selection logic.

public static bool Ignore(string[] tags, string runTags)
 if (string.IsNullOrWhiteSpace(runTags))
 return false;
//If runTags has a value the tag must match or is ignored
 if (tags == null)
 throw new IgnoreException("Ignored tags is null.");
 if (tags.Contains("ignore", StringComparer.InvariantCultureIgnoreCase))
 throw new IgnoreException("Ignored");
if (tags.Contains("manual", StringComparer.InvariantCultureIgnoreCase))
 throw new IgnoreException("Manual");
if (runTags == "all" || runTags == "all,all")
 return false;
if (tags.Contains(runTags, StringComparer.InvariantCultureIgnoreCase))
 return false;
return true;

This provides the meat of the solution where most of the logic for the solution is. As you can see the exceptions contain messages for when a test is explicitly ignored with the Ignore or Manual tag. The manual tag identifies features that are defined, but can’t be automated. This way we still have a formal definition that can guide our manual testing.

The variable runTags holds the value retrieved from configuration. If the config defines “all” or “all,all”, we run all the tests that aren’t explicitly ignored. The “all,all” is a special case when ignoring test at the Feature level, but this post is about Scenario level ignoring.

The final test is to compare the tags to the runTags config. If the tags include the runTags, we run the test. Any tests that don’t match are ignored. For scenarios this only works for one runTag. Maybe we name the tag after the sprint, a sprint ticket, or whatever it is it has to be unique for the sprint. I like the idea of tagging with a ticket number as it gives tracability to tickets in the project management system.

Improvements and Changes

I have contemplated using a feature file organization similar to SpecLog. They advocate a separate folder to hold feature files for the current sprint. Then I believe, but not sure, that they tag the current sprint feature files so they can be identified and ran in isolation. The problem with this is that the current sprint features have to be merged with the current features after the sprint is complete.

Another question I have asked myself is do I want to allow some kind of test selection through command line parameters. I am not really sure yet. I will put that thought on hold for now or until a need for command line configuration makes itself evident.

Lastly, another improvement would be to allow specifying multiple runTags. We would have to then iterate the run tags and compare or come up with a performant way of doing. Performance would be an issue as this would have to run on every test and for a large project there could be thousands of tests with each test already taking having an inherent performance issue in having to run in a browser.


Well that’s it. Sprint testing can include browser based testing and still run significantly faster than running every test in a test suite.

C# MEF BrowserFactory for Browser Based Testing

In my browser based test framework I use MEF to help abstract the concept of a browser. The reason I do this is so I am not tied to a specific browser driver framework (e.g. WebDriver, WatiN, System.Net.WebClient). This allows me to change drivers without having to touch my test code. Here’s how I do it.

Browser Interface

First I created an interface that represents a browser. I used a mixture of interfaces from WebDriver and WatiN.

namespace CharlesBryant.TestPipe.Interfaces
 using System;
 using System.Collections.Generic;
 using System.Collections.ObjectModel;
 using CharlesBryant.TestPipe.Browser;
 using CharlesBryant.TestPipe.Enums;
public interface IBrowser
 IBrowserSearchContext BrowserSearchContext { get; }
BrowserTypeEnum BrowserType { get; }
string CurrentWindowHandle { get; }
string PageSource { get; }
string Title { get; }
string Url { get; }
ReadOnlyCollection WindowHandles { get; }
IElement ActiveElement();
void Close();
void DeleteAllCookies();
void DeleteCookieNamed(string name);
Dictionary<string, string> GetAllCookies();
bool HasUrl(string pageUrl);
void LoadBrowser(BrowserTypeEnum browser, BrowserConfiguration configuration = null);
void Open(string url, uint timeoutInSeconds = 0);
void Quit();
void Refresh();
void SendBrowserKeys(string keys);
void TakeScreenshot(string screenshotPath);
void AddCookie(string key, string value, string path = "/", string domain = null, DateTime? expiry = null);

Pretty basic stuff although BrowserSearchContext took some thought to get it working. Basically, this abstraction provides the facility to search for elements. A lot of the concepts here are borrowed from WebDriver and WaitN and are just a way to be able to wrap there functionality and use it without being directly dependent on them. To use this you have to change your tests from directly using a browser driver to using this abstraction. At the start of your tests you use the BrowserFactory to get the specific implementation of this interface that you want to test with.

Browser Factory

Then I created a BrowserFactory that uses MEF to load browsers that implement the browser interface. When I need to use a browser I call Create in the BrowserFactory to get the browser driver I want to test with. To make this happen I have to actually create wrappers around the browser drivers I want available. One caveat about MEF is that it needs to be able to find your extensions so you have to tell it where to find them. To make the browsers available to the factory I added an MEF class attribute, [Export(typeof(IBrowser))] to my browser implementations. Then I add a post build event to the browser implementation projects to copy their DLL to a central folder:

copy $(TargetPath) $(SolutionDir)\Plugins\Browsers\$(TargetFileName)

Then I added a appConfig key with a value that points to this directory to the config of my clients that use the BrowserFactory. Now I can reference this config value to tell MEF where to load browsers from. Below is sort of how I use the factory with MEF.

namespace CharlesBryant.TestPipe.Browser
 using System;
 using System.ComponentModel.Composition;
 using System.ComponentModel.Composition.Hosting;
 using System.Configuration;
 using System.IO;
 using System.Reflection;
 using CharlesBryant.TestPipe.Enums;
 using CharlesBryant.TestPipe.Interfaces; 

 public class BrowserFactory
 private IBrowser browser; 

 public static IBrowser Create(BrowserTypeEnum browserType)
 BrowserFactory factory = new BrowserFactory();
 return factory.Compose(browserType);

 private IBrowser Compose(BrowserTypeEnum browserType)
 this.browser = null; 

 AggregateCatalog aggregateCatalogue = new AggregateCatalog();
 aggregateCatalogue.Catalogs.Add(new DirectoryCatalog(ConfigurationManager.AppSettings["browser.plugins"])); 
 CompositionContainer container = new CompositionContainer(aggregateCatalogue);
 catch (FileNotFoundException)
 catch (CompositionException)

 return this.browser;

namespace CharlesBryant.TestPipe.Enums
public enum BrowserTypeEnum


Well that’s the gist of it. I have untethered my tests from browser driver frameworks. This is not fully tested across a broad range of scenarios so there may be issues, but so far its doing OK for me.

The examples above are not production code, use at your own risk.

My Best Practices for Functional Testing

I am not a big fan of best practices because they have proliferated to the point that it’s hard to trust that some arbitrary blog espousing best practices has really put in the time and has the experience behind the practices to qualify them as best. So, I qualify this post with “MY”. These are practices that I am using right now that have proven to work for me across multiple projects. I am by no means a Functional Testing guru. I have been a developer for many years, but just started functional testing full time last month. Much of this has roots in other “best practices” so there is really nothing new, just developer common sense. This is just a way for me to start to catalog my practices for future reference and to share with the community.

  1. Coherent – scenarios should assert a single concept. Unlike unit testing I believe it is OK to make multiple assertions in a functional test because rerunning the same 10 second process to make discrete assertions across the state of a page is a waste of time. The multiple asserts should each include some type of message so you know which one failed. Even though I advocate making multiple assertions the assertions should be related. You should not assert that your button click worked because you landed on the correct page, then assert that the landing page has the correct content and clicking a link on the landing page sent you to the home page. This is an example of asserting multiple concepts and this type of multiple assertion is a no-no. In the example, asserting the button click worked, asserting the page has the correct content, and asserting that the link worked are all different concepts that express distinct concerns that should be asserted in isolation. This test should have only been an assertion that the button clicked worked and sent you to the correct page. 
  2. Light – keep your scenario definitions light on details and heavy on business value. Do try to define a script that a QA tester can follow in there testing, but express only the details necessary to convey the concerns that address the business value of the feature. The other asserts should have been in other tests. If you are defining a business process for making a payment on a website you don’t have to state every step taken to get to the payment page or every mouse click and keystroke taken to enter, submit and verify the payment. Pull out the steps the can be implied. Have your scenarios read more like a story for business people and not a script for QA and developers. Even if you don’t have business people reading the features and scenarios, you will find that they become a lot easier to maintain because they aren’t tied to details that can change wildly in new feature development.
  3. Independent – scenarios should not rely on the results of any other scenario. Likewise you should insure your scenarios are not influenced by the results of other scenarios. I learned the term “Flaky Test” by watching a couple videos by the Google test team. Flaky tests are tests that sometimes pass and sometimes fail even though the test input and steps don’t change. Many times this is because of side effects produced by previously ran scenarios. A developer way of expressing this would be given a test that is ran with the same input should produce the same result on each test run. The test should be idempotent.
  4. Focused – in your test action or “When” step, in Gherkin, you should only trigger one event in your domain. If you are triggering multiple actions, from across multiple contexts it becomes difficult to know what is being tested. Many times when I see tests with multiple “when” steps they are actually mixing additional “given” or setup steps with the actual action of the test. You could say that this is just semantics and I’m being a Gherkin snob protecting the sanctity of the Gherkin step types, but to me it just keeps scenarios simple when you know exactly what is being tested. If you feel like the multiple “when” steps are valid combine them into one step and express the multiple actions in the code behind the scenario. Keep your scenario definition focused on testing one thing.
  5. Fast – be mindful of the performance of your scenarios. Even though you may be writing slow functional tests you should not add to the slowness by writing slow code to implement your scenarios. I would take this further and say that you should write test code with the same care and engineering discipline that is used to write production code.
  6. Simple – try not to expose complexities in your test steps. Wrap your complexities. This is similar to keeping the test focused, but extends to the entire scenario and test code not just the action step in your scenario definition. This is both a feature analysis and test development principle. Think about the Page Object Model. It hides complexities of page interactions and improves maintainability of tests while making your test code and scenario definitions simple. Don’t include a lot of complex details in your scenarios or they will be bound to the details and when the details change you will have to change the scenario, the step code and probably more to make the change.

Yes, that spells CLIFFS. I wanted to join the acronym bandwagon. This would be a better post if it had examples or explanations, but this is a lazy post to keep my blogging going. If you disagree or want clarification, I would be glad to do a follow up on my thoughts on these practices. I am anxious to see how these stand up to a review at the end of this year.

A Twist on Test Structure

As you may not know, I love testing. Unit tests, integration tests, performance tests, and acceptance tests all have a prominent place in my development methodology. So, I love when I learn new tips and tricks that help simplify testing. Well Phil Haack did a post on “Structuring Unit Tests” that was quite ingenious even though he got it from a guy (Drew Miller), who got it from two other guys (Brad Wilson and James Newkirk).

The gist is to write a test class to contain tests for a specific class under test. Then have sub classes within the test class for each method of the class under test. Then Brian Rigsby took it further and showed how to reuse initialization code written in the parent class for all of the sub classes. Below is the result of structuring from Brian’s blog.

 public class TitleizerTests
 protected Titleizer target;

 public void Init()
  target = new Titleizer();

 public class TheTitleizerMethod : TitleizerTests
  public void ReturnsDefaultTitleForNullName()
      string result = target.Titleize(null);

      Assert.AreEqual(result, "Your name is now Phil the Foolish");

  public void AppendsTitleToName()
      string result = target.Titleize("Brian");

      Assert.AreEqual(result, "Brian the awesome hearted");


I like how the results are better structured as a result of this test code structure without having to repeat initialization code. The problem with this approach is that it violates Code Analysis rule CA1034: Nested types should not be visible. I know this is a test class and not production code, so maybe I am pointing out something that is not worth pointing out. The thing is I have been bitten a few times by thinking it is OK to ignore Code Analysis rules so I have to do my due diligence to insure this won’t cause issues down the road. Also IMHO, test code should be as good as production code.

So far it seems as if the main reason for the rule is to protect external callers of the publically exposed nested types. Maintainability is the common theme I can find in explanations. If you move the nested type outside of the contained type it will be a breaking change for external caller. For now, I will ignore the rule as I try this test structure out, but I am afraid…very afraid.