An Easy Win for Testable Methods

One of our developers was having an issue testing a service. Basically, he was having a hard time hitting the service as it is controlled by an external company and we don’t have firewall rules to allow us to easily reach it in our local environment. I suggested mocking the service since we really weren’t testing getting a response from the service, but what we do with the response. I was told that the code is not conducive to mocking. So, I took a look and they were right, but the fix to make it testable was a very simple refactor. Here is the gist of the code:

public SomeResponseObject GetResponse(SomeRequestObject request)
{
//Set some additonal properties on the request
request.Id = "12345";

//Get the response from the service
SomeResponseObject response = Client.SendRequest(request);

//Do something with the response
if (response != null)
{
//Do some awesome stuff to the response
response.LogId = "98765";
if (response.Id > "999")
{
response.Special = true;
}
LogResponse(response);
}

return response;
}

What we want to test is the “Do something with the response” section of the code, but this method is doing so many things that we can’t isolate that section and test it…or can we? To make this testable we simply move everything that conserns “Do something with the response” to a separate method.

public SomeResponseObject GetResponse(SomeRequestObject request)
{
//Set some additonal properties on the request
request.Id = "12345";

//Get the response from the service
SomeResponseObject response = Client.SendRequest(request);

return ProcessResponse(response);
}
public SomeResponseObject ProcessResponse(SomeResponseObject response)
{
//Do something with the response
if (response != null)
{
//Do some awesome stuff to the response
response.LogId = "98765";
if (response.Id > "999")
{
response.Special = true;
}
LogResponse(response);
}

return response;
}

Now we can just test the changes to ProcessResponse method in isolation away from the service calls. Since there were no changes to the service or the service client, we didn’t have to worry about testing them for this specific change. We don’t care what gets returned we just want to know if the response was properly processed and logged. We still have a hard dependency on LogResponse’s connection to the database, but I will live with this as an integration test and fight for unit tests another day. This is a quick win for testability and a step closer to making this class SOLID.

Optimizing the Software Delivery Pipeline: Deployment Metrics

Currently, I have no way of easily determining what build version is deployed to an environment. This made me take more interest in metrics about deployments, we basically have none. I can look at the the CD (continuous deployment) server and see what time a deployment was done and I can look at the builds on the server and sort of deduce which build was deployed, but I have to manually check the server to verify my assumptions. I wondered what else I am missing. Am I flying blind, should I know more?

Metrics in the Software Delivery Pipeline

I am part of a work group that is exploring software quality metrics. So, my first instinct was to think about deployment quality metrics. After some soul searching, I decided what would be most helpful to me is to know where our bottle necks are. We have an assembly line or pipeline that consists of various stages our software goes through as it makes its way to public consumption. Develop, build, deploy, test, and release are the major phases of our software delivery pipeline (I am not including planning or analysis right now as that is another animal).

I believe that metrics that focus on reducing time in our software delivery pipeline will be more effective than just focusing on reducing defects or increasing quality. If we can reduce defects or increase quality in faster delivery iterations, the effect of defects and poor quality will have less of an impact. This is the point of quality metrics in the first place, reducing the effects of poor quality on our customers and the business. Focusing on reducing time in the pipeline also supports our quality initiatives as the tools to reduce time, like automated CI and testing, not only reduce iteration time, but improve quality. Faster release iterations will allow us to address quality issues quicker. This is not to say that other metrics should be ignored. I just think that since we have no real metrics at the moment starting with metrics that support speeding up the pipe is a worthy first step.

Deployment Metrics

Back to the point. What metrics should I capture for deployments. If my goal is to increase throughput in the pipeline, I need to identify bottlenecks. So, I need some timing data.

  • How long does deployment take?
  • How long do the individual deployment steps take?
  • How do we report this over time so we can identify issues?

This is pretty simple and I can extract it from the deployment log on the server. Reporting would be just a matter of querying this data and displaying deployment time totals over time.

Additional Deployment Metrics

In addition to the timing data it may be worthwhile to capture additional metrics like the size of deployment. Deploying involves pushing packages across wires and the size of the packages can have an effect on deployment time. Issues with individual servers can affect deployment time so, knowing the servers being deployed to can help identify server issues. With the timing data, we can also capture

  • The version of the build being deployed
  • The environment being deployed to
  • The individual servers being deployed to
  • The size and version of the packages being deployed to a server

Deployment Data

So, my first iteration of metrics center around timing, but would also have other data to give a more robust picture of deployments. This is a naive first draft of what the data schema could look like. I would suspect that this can all be captured on most CI/CD servers and augmented with data generated by the reporting tool:

  • Deployment Id – a unique identifier for the deployment, generated by the reporting tool
  • Environment Id – a unique identifier for the environment deployed to, generated by the reporting tool
  • Build Version – build version should be the version captured on the server
  • Timestamp – timestamp is the date/time the deployment record was created
  • Start – the date/time the deployment started
  • End – the date/time the deployment completed
  • Tasks – tasks are the individual steps taken by the deployment script; it is possible that there is only one step, it all depends on how deployment is scripted
    • Deployment Task Id – a unique identifier for the task, generated by the reporting tool
    • Server Id – a unique identifier for the physical server deployed to, generated by the reporting tool
    • Packages – packages represent the group of files pushed to the server, this is normally a zip or NuGet package in my scenarios
      • Package Version – the version of the package being pushed, this may be different than the software version and is generated outside of the reporting tool
      • Package Size – the physical size of the package in KB or MB (not sure which is better)
    • Start – the date/time the deployment to the server started
    • End – the date/time the deployment to the server ended

Imagine the above as some beautiful XML, JSON, or ProtoBuf, because I am too lazy to write it.

If my goal is to increase throughput in the pipe I should probably think about a higher level of abstraction in the hierarchy so that I can relate metrics from other parts of the pipeline. For now I will focus on this as a first step to prove that this is doable and provides some value.

All I need to do is a create data parsing tool that can be called by the deployment server once a deployment is done. The tool will receive the server log and store it, parse the log and generate a data structure similar to above, then store the data in a database. Then I have to create a reporting tool that can present graphs and charts of the data for easy analysis. Lastly, create an API that will allow other tools to consume the data. This maybe a job for CQRS and event sourcing. Easy right :). I know there is a tool for that, but I am a sucker for punishment.

Conclusion

This post will take more time than I thought so I will make this a series. I will cover my thoughts on metrics for development, build, test, and release in upcoming posts (if I can remember). Then possibly some posts on my thoughts on how the metrics and tools can be used to optimize the pipeline. Pretty ambitious, but sounds like fun to me.

Test Automation Tips: 1

#1 Don’t test links unless you are testing links.

Unless you are specifically testing menus or navigation links, don’t automate the clicking of links to navigate to the actual page under test. Instead take the shortest route to set the test up.

Let’s say I have a test that starts by clicking the product catalog link, then clicks on a product to get to product details, just so I can test that a coupon appears in product details. This test is testing too many concepts that are out of the scope of the intent of the test. I am trying to test the coupon visibility, not links. Instead, I should navigate directly to the product details page and make my assertion.

If you use the link method and fool yourself into believing that your test is acting like a user, when links are broken your test will fail for a reason other than the reason you are testing. If you use the same broken link in multiple tests, you will have multiple failures and will have to fix and rerun the tests to get feedback on the features you are really trying to test. If the broken tests are in a nightly test, you just lost a lot of test coverage and and you don’t know if the features are passing or failing because they never got tested. Using the link method can cause a dramatic loss of test coverage. So, test links in navigation tests only when the links are front and center in the purpose of the test. For all other tests, take the shortest route to the page under.

View more tips.

When I am writing and maintaining large functional UI tests I often realize somethings that would make my life easier. I decided to write this series of posts to describe some of the tips I have for myself in hopes that they prove to be helpful to someone else. What are some of your tips?

Test Automation Tips

What Makes a Good Candidate for Test Automation?

Writing large UI based functional tests can be expensive in terms of money and time. It is sometimes hard to know where to focus your test budget. New features are good candidates, especially the most common successful and exceptional paths through the feature. But, when you have a monster legacy application with little to no coverage, where to get the biggest bang for the buck can be hard to ascertain.

Bugs, Defects, Issues…It Doesn’t Work

I believe bugs provide a good candidate for automation, especially if regression is a problem for you. Even if regression is not an issue, its always good to protect against regressions. So, automating bugs are kind of a win-win in terms of risk assessment. Hopefully, when a bug is found whoever finds the bug or whoever adds it to the bug database provides reproduction steps. If the steps are a good candidate for automation, automate it.

Analyzing Bugs

What makes a bug a good candidate for test automation? When analyzing bugs for automated testing I like to evaluate on 4 basic criteria. In descending order of precedence:

  • The steps are easy to model in the test framework.
  • The steps are maintainable as an automated test.
  • The bug was found before.
  • The bug caused a lot of pain to users or the company.

It is just common sense that “bug caused a lot of pain” is the top candidate. If a bug caused a lot of pain, you don’t want to repeat it, unless you like pain. Yet, if the painful bug is a maintenance nightmare as an automated test, the steps are hard to model, and the bug wasn’t found before you may want to just mark it for manual regression. If your test matches 2 or more of the criteria I’d say it is a high priority candidate for test automation.

Conclusion

These are just my opinion and there is no study to prove any of it. I know this has been thought of and pondered, maybe even researched by someone. If you know where I can find some good discussions on this topic or if you want to start one, please let me know.

 

Finding Bad Build Culprits…Who Broke the Build!

I found an interesting Google Talk on finding culprits automatically in failing builds – https://www.youtube.com/watch?v=SZLuBYlq3OM. This is actually a lightening talk at GTAC 2013 given by grad students Celal Ziftci and Vivek Ramavajjala. First they gave an overview of how culprit analysis is done on build failures triggered by small test and medium sized tests.

CL or change list is a term I first heard in “How Google Tests Software” and refers to a logical grouping o changes committed to the source tree. This would be like a git feature branch.

Build and Small Tests Failures

When the build fails because of a build issue we build the CLs separately until a CL fails the build. When the failure is a small test (unit test) we do the same thing. Build CLs separately and run the tests against them to find the culprit. In both cases, we can do the analysis in parallel to speed it up. This is what I covered in my post on Bisecting Our Code Quality Pipeline where git bisect is used to recurse the CLs.

Medium Tests

Ziftci and Ramavajjala define these tests as taking less than 8 minutes to run and suggest using a binary search to find the culprit. Target the middle CL, build it and if it fails, the culprit is most likely to the left, so we recurse to the left until we find the culprit. If it passes, we recurse to the right.

CL 1 – CL 2 – CL 3 – CL 4 – CL 5 – CL6

CL 1 is the last known passing CL. CL 6 was the last CL in the failing build. We start by analyzing CL 4 and if fails, then we move left and check CL 3. If CL 3 passes, we mark CL 4 as the culprit. If CL 3 fails we mark CL 2 as the culprit because we know that CL 1 was good and don’t need to continue analyzing.

If CL 4 passed, we would move right and test CL 5 and if it fails, mark CL 5 as the culprit. If it passes, then we mark CL 6 as the culprit because it is the last suspect and we don’t have to waste resources analyzing it.

Large Tests

They defined these tests as taking longer than 8 minutes to run. This was the primary focus of Ramavajjala and Ziftci’s research. They are focusing on developing heuristics that will let a tool identify culprits by pattern matching. They explained how they have a heuristic that will analyze a CL for number of files changed and give a higher ranking to CLs with more files changed.

They also have a heuristic that calculates the distance of code in the CL from base libraries, like the distance from the core Python library for example. The closer it is to the core the more likely that it is a core piece of code that has had more rigorous evaluation because there may be many projects depending on it.

They seemed to be investing a lot of time into insuring that they can do this fast. They stress caching and optimizing how they do this. It sounds interesting and once they have had a chance to run their tool and heuristics against the massive amount of tests at Google (they both became employees of Google) hopefully they can share the heuristics that prove to be most adept at finding culprits at Google and maybe anywhere.

Thoughts

They did mention possibly using a heuristic that looks at the logs generated by build failures to identify keywords that may provide more detail on who the culprit maybe. I had a similar thought after I wrote the git bisect post.

Many times when a test fails in larger tests there are clues left behind that we would normally manually inspect to find the culprit. If the test has good messaging on their assertions, that is the first place to look. In a large end to end test there may be many places for the test to fail, but if the failure message gives a clue of what fails it helps to find the culprit. Although, they spoke of 2 hr tests and I have never seen one test that takes 2 hours so what I was thinking about and what they are dealing with may be another animal.

There is also the test itself. If the test covers a feature and I know that only one file in one CL is included in the dependencies involved in the feature test, then I have a candidate. There is also application logs and system logs. The goal as I saw it was to find a trail that led me back to a class, method, or file that matches a CL.

The problem with me trying to seriously try and solve this is I don’t have a PhD in Computer Science, actually I don’t have a degree except from the school of hard knocks. When they talked about the binary search for medium sized tests it sounded great. I kind of know what a binary search is. I have read about it and remember writing a simple one years ago, but if you ask me to articulate the benefits of using quad tree instead of binary search or to write a particular one on the spot, I will fumble. So, trying to find an automated way to analyze logs in a thorough, fast and resource friendly manner is a lot for my brain to handle. Yet, I haven’t let my shortcomings stop me yet, so I will continue to ponder the question.

We are talking about parsing and matching strings, not rocket science. This maybe a chance for me to use or learn a new language more adept at working with strings than C#.

Conclusion

At any rate I find this stuff fascinating and useful in my new role. Hopefully, I can find more on this subject.

 

Configure Remote Windows 2012 Server

Have you ever needed to inspect or configure a server and didn’t want to go through the hassle of remoting into the server? Me too. Well as I take a deeper dive into the bowels of PowerShell 4 I found a cmdlet that allows me to issue PowerShell commands on my local machine and have them run on the remote server. I know your excited, I couldn’t contain myself either. You will need PowerShell 4 and a Windows 2012 server that you have login rights to control. I am going to give you the commands to get you started and then you can Bing the rest, but its pretty simple. Once you established the connection, you just issue PowerShell commands just as if you were running them locally. Basically, you can configure your remote server from your local machine. You don’t even need to activate the GUI on the server. You can just drive it all from PowerShell and save the resources needed with the GUI.

Security

Is it secure? About as secure as you remoting into the server through a GUI. Yet, there is a difference in the vulnerabilities that you have to deal with. Security will always be an issue. This is something I will have to research more, but I do know that you can encrypt the traffic and keep the messages deep inside your DMZ.

Code

Note: Anything before the > is part of the command prompt.

PS C:\> Enter-PSSession -ComputerName server01
[server01]: PS C:\Users\CharlesBryant\Documents>

This starts the session. Notice that the command prompt now has the server name in braces and I am in my documents folder on the server.

[server01]: PS C:\Users\CharlesBryant\Documents> hostname
server01

Here I issue the host name command to make sure I’m not dreaming and I am actually on the server. Yup, this is really happening.

[server01]: PS C:\Users\CharlesBryant\Documents> Get-EventLog -list | Where-Object {$_.logdisplay name -eq "Application"}
Max(K) Retain OverflowAction Entries Log
------ ------ -------------- ------- ---
4,096 0 OverwriteAsNeeded 3,092 Application

Yes…I just queried the event log on a remote server without having to go through the remote desktop dance. BooYah! To end your session is even easier.

[server01]: PS C:\Users\CharlesBryant\Documents> Exit

Enjoy.

How Much Does Automated Test Maintenance Cost?

I saw this question on a forum and it made me pause for a second to think about it. The quick answer is it varies. The sarcastic answer is it costs as much as you spend on it, or how about, it cost as much as you didn’t spend on creating a maintainable automation project.

I have only been involved in 2 other test automation projects prior to my current position. In both I also had feature development responsibility. On one of the projects, comparing against time developing features, I spent about 10-15% of my time maintaining tests and about 25% writing them. So, that is about 30-40% of my total test time on maintenance. Based on my knowledge today, some of my past tests weren’t that good so maybe the numbers should have been higher or lower. On the other project, test maintenance was closer to 50% and that was because of poor tool choice. I can state the numbers because I tracked my time spent. I could not use these as benchmarks to estimate maintenance cost on my current project or any other unless the context was very similar and I can easily draw the comparison.

I have seen where someone might say “it’s typically between this and that percentage of development cost,” or something similar. Trying to quantify maintenance costs is hard, very hard and it depends on the context. You can try to estimate based on someone else’s guess of a rough percentage and hope it pans out, but in the end it is dependent on execution and environment. An application that changes often vs. one that rarely changes, poorly written automated tests, bad choice of automation framework, skill of the automated tester…there is a lot that can change cost from project to project. I am curious if someone has a formula to calculate an estimate across all projects, but having an insane focus on the maintainability of your automated test suites can significantly reduce costs in the long run. So a better focus, IMHO, is on getting the best test architecture, tools, framework, people and make maintainability a high priority goal. Also properly tracking maintenance in the project management or bug tracking system can provide a more valuable measure of cost across the life of a project. If you properly track maintenance cost (time), you get a benchmark that is customized for your context. Trying to calculate cost up front with nothing to base the calculations on but a wild uneducated guess can lead to a false sense of security.

So, if you are trying to plan a new automation project and you ask me about cost the answer is, “The cost of having automated tests…priceless. The cost of maintaining automated tests…I have no idea.”

Configure MSDTC with PowerShell 4.0

Continuing on the PowerShell them from my last post, I wanted to save some knowledge on working with DTC in PowerShell. I am not going to list every command, just what I’ve used recently to configure DTC. You can find more infomarion on MSDN, http://msdn.microsoft.com/en-us/library/windows/desktop/hh829474%28v=vs.85%29.aspx or TechNet, http://technet.microsoft.com/en-us/library/dn464259.aspx.

View DTC Instances

Get-DTC will print a list of DTC instances on the machine.

PS> Get-Dtc

Stop and Start DTC

Stop

PS> Stop-Dtc -DtcName Local

Stopping DTC will abort all active transactions. So, you will get asked to confirm this action unless you turn off confirmation.

PS> Stop-Dtc -DtcName Local -Confirm:$False

Start

PS> Start-Dtc -DtcName Local

Status

You could use a script to confirm that DTC is started or stopped. When you call Get-Dtc and pass it an instance name it will return a property named “Status”. This property will tell you if the DTC instance is Started or Stopped.

PS> Get-Dtc -DtcName Local

Network Settings

You can view and adjust DTC Network Settings.

View

To veiw the network setting:

PS> Get-DtcNetworkSetting -DtcName Local

-DtcName is the name of the DTC instance.

Set

To set the network settings:

PS> Set-DtcNetworkSetting -DtcName Local -AuthenticationLevel Mutual -InboundTransactionsEnabled $True -LUTransactionsEnabled $True -OutboundTransactionsEnabled $True -RemoteAdministrationAccessEnabled $False -RemoteClientAccessEnabled $True -XATransactionsEnabled $False

Here we set the name of the instance to set values for then list the property value pairs we want to set. $True/$False are PowerShell parameters that return the boolean values for true or false respectively. If you try to run this set command, you will get a message asking if you want to stop DTC. I tried first stopping DTC then running this command and it still presented the confirmation message. You can add -Confirm:$False to turn off the confirmation message.

Conclusion

There is a lot more you can do, but this fits my automation needs. The only thing I couldn’t figure out is how to set the DTC Logon Account. There maybe a magical way of finding the registry keys and setting them or something, but I couldn’t find anything on it. If you know, please share…I’ll give you a cookie.

http://www.sqlha.com/2013/03/12/how-to-properly-configure-dtc-for-clustered-instances-of-sql-server-with-windows-server-2008-r2/ – Has some nice info on DTC and DTC in a clustered SQL Server environment. He even has a PowerShell script to automate configuration…Kudos. Sadly, his script doesn’t set Logon Account.

 

Bisecting Our Code Quality Pipeline

I want to implement gated check-ins, but it will be some time before I can restructure our process and tooling to accomplish it. What I really want is to be able to keep the source tree green and when it is red provide feedback to quickly get it green again. I want to run tests on every commit and give developers feedback on their failing commits before it pollutes the source tree. Unfortunately, to run the tests as we have it today would take too long to test on every commit. I came across a quick blog post by Ayende Rahien on Bisecting RavenDB and they had a solution were they used git bisect to find the culprit that failed a test. They gave no information on how it actually worked just a tease that they are doing it. I left a comment to see if they would share some of their secret sauce behind their solution, but until I get that response I wanted to ponder it for a moment.

Git Bisect

To speed up testing and also allow test failure culprit identification with git bisect we would need a custom test runner that can identify what test to run and run them. We don’t run tests on every commit, we run tests nightly against all the commits that occurred for the day. When the test fails it can be difficult identifying the culprit(s) that failed the test. This is were the Ayende steps in with his team’s idea to use bisect to help identity the culprit. Bisect works by traversing commits. It starts at the commit we mark as the last known good commit to the last commit that was included in the failing nightly test. As bisect iterates over the commits, it pauses at each commit and allows you to test it and mark if it is good or bad. In our case we could run a test against a single commit. If it passes, tell bisect its good and to move to the next. If it fails, save the commit and failing test(s) as a culprit, tell bisect its bad and to move to the next. This will result in a list of culprit commits and their failing tests that we can use for reporting and bashing over the head of the culprit owners (just kidding…not).

Custom Test Runner

The test runner has to be intelligent enough to run all of the tests that exercise the code included in a commit. The custom test runner has to look for testable code files in the commit change log, in our case .cs files. When it finds a code file it will identify the class in the code file and find the test that targets the class. We are assuming one class per code file and one unit test class per code file class. If this convention isn’t enforced, then some tests may be missed or we have to do a more complex search. Once all of the test classes are found for the commit’s code files, we run the the tests. If a test fails, we save the test name and maybe failure results, exception, stack trace… so it can be associated with the culprit commit. Once all of the tests are ran, if any of them failed, we mark the commit as a culprit. After the test and culprit identification is complete, we tell bisect to move to the next commit. As I said before, this will result in a list of culprits and failing test info that we can use in our feedback to the developers.

Make It Faster

We could make this fancy and look for the specific methods that were changed in the commit’s code file classes. We would then only find tests that test the methods that were changed. This would make testing focused like a lazer and even faster, but we could probably employ Roslyn to handle the code analysis to make finding tests easier. I suspect tools like ContinuousTests – MightyMoose do something like this, so it’s not that far fetched an idea, but definitely a mountain of things to think about.

Conclusion

Well this is just a thought, a thesis if you will, and if it works, it will open up all kind of possibilities to improve our Code Quality Pipeline. Thanks Ayende and please think about open sourcing that bisect.ps1 PowerShell script 🙂