Ctrl-S Rapid Feedback Loop

Kent Beck, inventor of Extreme Programming and TDD guru, did a short video on how he went about learning CoffeeScript. The beauty of what he described didn’t have much to do with what he learned, but how. He based the video off of his “Making Making Manefesto” and some example of Making Making that inspired him.

What he did was create a quick little test framework that gave him instant feedback on the quality of his code every time he saved his code (Ctrl-S on Windows). This in effect gave him feedback on not only the code he was writing, but his making making thought process while learning CoffeeScript all at the same time.

I have seen rapid test feedback with MightyMoose for .Net, but that is slow in comparison to what he was able to achieve. It helps that JavaScript, even with CoffeeScript in the middle, doesn’t have a heavy compilation step as it is an interpreted language. I have also seen the benefits of file watchers when working with SASS and LESS for CSS to speed up feedback loops in UI development. I have played with rapid feedback with HTML changes in Chrome Developer tools (very fast). Yet, the context of using it to learn a new language never dawned on me. I have used numerous scripting sites, like CodeAcademy, to learn the basics of Perl, Ruby and others by following a set guide to learning the language. I have never seen it done like this with such ease, expressiveness, and ability to experiment and wander while maintaining a constant sense that you are on the right path.

Anyway, with my intense, somewhat obsessive, focus on improving feedback loops in software delivery, this was a great example of how automation can help increase efficiency. I wish we could do this in Visual Studio with similar speed.

  1. A test window to write tests for new code or code changes I want to write.
  2. A code window to write the new code or code changes.
  3. A result window to view instant results of the tests after saving either the test or code window.

Does a solution with similar speed as Kent’s example, but for C#, reside somewhere in Roslyn, maybe. It’s possible that MightMoose is the answer and is faster than when I first tried it years back. Will I find time to explore it, probably not, but I would really like to.

Making Making Coffee

.Net Continuous Test

Chrome Developer Tools Live Editing

IIS 8 Configuration File

Note to self

The IIS 8 configuration file is located in %windir%\System32\inetsrv\config\applicationHost.config. It is just an XML file and the schema is well known. You can open it, edit it (if you are brave), and otherwise do configuration stuff with it. You can diff it from system to system to find inconsistencies or save it in a source code repository to standardize on a base configuration across web server nodes, if your project needs that kind of thing. Lastly, you can manage it with Powershell… you can manage it with Powershell… you can manage it with Powershell DSC!

The possibilities are endless so stop depending so much on the IIS Server Manager UI like you are in Dev preschool. You are a big boy now, remove the training wheels, but you might want to wear a helmet.

I don’t want to have this discussion again!

Blameless RCA

Let ye without failure cast the first stone.

I am involved in a workgroup at work that is exploring Root Cause Analysis in the hopes that we can come up with a way to help everyone improve their RCA process and procedures.

I believe it is important in our RCA recommendations to strive to build a culture around RCA. To borrow from a theme brought up by a workgroup member, culture building should be extended to retrospectives and all of our continuous improvement processes in general.

Just Culture

For RCA to be most effective we should instill the idea of the “blameless postmortem” into how we envision RCA. Blameless postmortem is an awesome concept that defines a culture around failure called a “Just Culture” that was introduced to me in a blog post by John Allspaw, Web Operations guru at Etsy. It’s a way to encourage team members to own their failures without fear in the hopes that a less hostile environment towards failure will encourage fast, detailed, feedback in active issue resolution and postmortems. We want team members to volunteer to report an issue as soon as they see it or cause it.

Owning Failure

In terms of RCA, this boils down to instilling the idea that finding who’s at fault, what team missed this or that, is not important. The only thing important is how, when, and why an issue was leaked and “who” is not under investigation. Granted who is at fault will most likely come out, and it should, but there should be no condemnation or negative side effect to owning a failure. We want “who” to come from failure owners themselves, not a lot of intricate detective work. We want the team to freely offer their actions that may have contributed to a failures in hopes that we can compile a timeline of multiple narratives of the failure from various perspectives. When we can freely own failure without retribution we are more apt to own up to a failure and share details that led to the failure so that it can be corrected.

Remove Managerial Blockages on RCA

There are managers that want to know who to blame so that they can monitor who is causing issues. If there is a problem with someone continuously failing, it will be evident without having to expose personal failures in the RCA process formally or as a part of team culture. Root cause is usually deeper than one person or team’s failure There are usually multiple stories that contribute to failures. There are managers that use hindsight to amplify the negative effect of failure to try to shame someone into being better. Highlighting what should have been done is not helpful as it doesn’t lead to change. Often times hindsight is disguised as a solution without ever understanding why the actions were taken that caused the failure or even how the manager’s mismanagement may have contributed to the failure. I only add this because I have seen many RCA or postmortems fail because of a manager trying to place blame and using their limited hindsight to declare the problem solved.

And More

There is a lot of good that comes from a Just Cause Culture. Since I saw some things in the RCA practices at work that may lead to the blame game, I thought that a blameless postmortem should be explicitly built into our RCA process in the hopes that it affects the culture. Just something to think about if you are going down this same road.

Orphaned Powershell PSDrive

I received this strange error while executing a script that creates a new PSDrive.

New-PSDrive : The local device name has a remembered connection to another network resource

I tried to use Remove-PSDrive, but

Remove-PSDrive : Cannot find drive. A drive with the name 'S:' does not exist.

I was able to fix this issue with the “net use” command.

First, I ran it to see if the drive was still mapped. I am still unsure how it is there between Powershell sessions, I must have missed something.

PS C:\> net use
New connections will be remembered.

Status Local Remote Network
------------------------------------------------------------------------------
Unavailable S: \\node1\d$ Microsoft Windows Network
Unavailable I: \\node2\it Microsoft Windows Network
OK P: \\public Microsoft Windows Network
The command completed successfully.

Then I ran “net use” with the delete parameter to remove the orphaned drive.

PS C:\> net use /delete S:
S: was deleted successfully.

I love it when a plan comes together.

What is this CIM I keep running into in Powershell?

I keep having to use CIM in my scripts, but what is it? I understand how to use it, but where did it come from and what does it stand for. Like every developer I know, a search engine is the best tool to solve this mystery.

There is an industry standards organization called DMTF (Distributed Management Task Force) that defined a standard named Common Information Model. By the way, this is the same group that defined MOF (Managed Object Framework) which is the standard below the covers of DTC. CIM is defined in the MOF standard and is a cross platform common definition of management information for systems, networks, and applications and services that allows for vendor extensions. How was that for acronym soup?

Mystery Solved.
52581a521605fb72a20000bb

Update PSModulePath for Custom PowerShell Module Development

I am in the process of a deep dive into DSC and I want to store my custom modules and DSC Resources in source control. To make it easy to run PowerShell modules you have to import them or have them on the PSModulePath environment variable. Since I don’t want to point a source repository to the default PowerShell module path, I want to add my custom module path to PSModulePath. This will save me some time when it comes to having to import modules and module changes. This means I will always be running the most recent version of my modules even the buggy ones, so if you do this, understand the implications.

It’s actually pretty easy to automate this with PowerShell. Since I already have some experience updating environment variables with PowerShell I just created a new script to add my custom module path to PSModulePath.

$currentModulePath = [Environment]::GetEnvironmentVariable("PSModulePath", "Machine")
$customModulePath = "C:\_DSC\DSCResources"
$newModulePath = $currentModulePath + ";" + $customModulePath
[Environment]::SetEnvironmentVariable("PSModulePath", $newModulePath, "Machine")

I complicated this script a bit so it is more self evident on what is happening (code as documentation – no comments necessary).

I can envision someone needing to also remove a path from PSModulePath, but this is enough to get started so I will leave it up to you, until I have a need for that :).

UPDATES

When running this script in an Invoke-Command on a remote session the modules aren’t immediately available if I tried to use modules in the new path. This is because the path is not updated on the remote session. A quick workaround for me was to remove the session and recreate it.

Get-PSSession | Remove-PSSession;

This removes all sessions so you may not want to do this. Since I don’t care about sessions I like it. This was just a one line change in my workflow script and it didn’t cause too much latency in the script execution. I know there are some other solutions that involve messing with the registry, but this is a one time deal so resetting the remote session works for me.

Orlando Code Camp Call for Speakers

Static Property Bug Caused by a Code Tool

With the following private properties at the top of a class.

private static readonly string taskTimeStampToken = string.Format("{0}Timestamp: ", taskToken);
private static readonly string taskToken = "[echo] @@";

What will this method in the same class print to the console when called.

public void Print()
{
 Console.Print(taskTimeStampToken);
}

For all you code geniuses that got this right away, I say shut up with your smug look and condescending tone as you say of course it prints

Timestamp:

So, why doesn’t it print

[echo] @@Timestamp:

Because taskToken hasn’t been initialized yet, of course. The order of static properties matter in your code. Don’t forget it, especially if you use a tool that reorganizes your code.

I really did know this ;), but this fact caused me about an hour of pain. I use a wonderful little tool called Codemaid and I use it to help keep my code standardized. One of its functions is to reorganize my code in a consistent format (e.g. private members above public, constructor before everything…).

I obviously never ran Codemaid on this particular file with code similar to above because unit tests have always passed. Well I had a code change in said file, Codemaid did its thing, it caused the order of the properties to flip like they are above, and unit tests started failing. It took me at least an hour before I took a deep breath and noticed that the properties flipped.

Lesson Learned

  • If you do any type of refactoring, even with a tool, make sure you have unit tests covering the file you refactor.
  • Use a diff tool to investigate mysteriously failing tests. It will give better visual clues on changes to investigate instead of relying on tired eyes.
  • Configure your code formatting tool to not reorganize your static properties, or request the feature to configure it.

GoCD: Agent Running Web Driver Test Hangs After Test Failure [SOLVED]

Problem

I had a nagging issue were some tests were failing the build on our GoCD server, but the agent was not reporting the failure to the server. We are using NAnt to run NUnit tests that in turn call Web Driver to exercise web pages. There were some test failures that correctly returned a non-zero value that failed the build in NAnt. Also, the failure is captured in the log and saved in a text file. Yet, the agent didn’t report the build failure or send the artifacts to the server.

Issue

After a 2 day search for answers and a deep dive into the bowels of GoCD I discovered that a Web Driver process was kept open after the test fails the build. Specifically, the process is IEDriverServer.exe. This process was being orphaned by improper cleanup in the tests that resulted in the Web Driver and browsers staying open after the test failure.

When I ran the tests again, I watched for the failure then manually killed Web Driver and the agent magically reported to the server. I am still unsure why Web Driver would prevent the GoCD agent from reporting to the server. They are both Java processes, maybe there is something going on in the JVM or something… not sure.

Solution

My work around at the moment is to run a task killer on failure in the test script. Here is the relevant portion of the nant script that drives the tests:

<property name="nant.onfailure" value="test.taskkiller" />
<target name="test.taskkiller">
 <exec program="taskkiller.bat" failonerror="false">
 </exec>
 </target>

The taskkiller.bat is just a simple bat file that will kill Web Driver and open browsers.

taskkill /IM IEDriverServer.exe /F
taskkill /IM iexplore.exe /F

Now this is just a band-aid. We will be updating our test framework to handle this. Additionally, killing all the processes like this isn’t good if we happen to be running tests in parallel on the agent, which may be a possibility in the future.

Sev1 Incident

I read a book called the Phoenix Project. A surprisingly good book about a company establishing a DevOps culture. One of the terms in the book that I had no experience with was Sev1 incident. I have since heard it repeated and have come to find out that it is part of a common grading of incident severity. Well, I decided to finally research it about a year after I read the book and put more thought into a formalized incident reporting, triage, mitigation, and postmortem workflow. Which is similar to the thoughts I had on triaging failing automated tests.

Severity Levels

So, first to define the severity levels. Fortunately, David Lutz has a good break down on his blog – http://dlutzy.wordpress.com/2013/10/13/incident-severity-sev1-sev2-sev3-sev4-sev5/.

Severity Levels

  • Sev1 Complete outage
  • Sev2 Major functionality broken and revenue affected
  • Sev3 Minor problem, bug
  • Sev4 Redundant component failure
  • Sev5 False alarm or alert for something you can’t fix

Identify Levels

With that I need to define how to identify the levels. IBM has a break down that simplifies it on their Java SDK site – http://publib.boulder.ibm.com/infocenter/javasdk/v1r4m2/index.jsp?topic=%2Fcom.ibm.java.doc.diagnostics.142%2Fhtml%2Fbugseverity.html:

Sev 1

  • In development: You cannot continue development.
  • In service: Customers cannot use your product.

Sev 2

  • In development: Major delays exist in your development.
  • In service: Users cannot access a major function of your product.

Sev 3

  • In development: Major delays exist in your development, but you have temporary workarounds, or can continue to work on other parts of your project.
  • In service: Users cannot access minor functions of your product.

Sev 4

  • In development: Minor delays and irritations exist, but good workarounds are available.
  • In service: Minor functions are affected or unavailable, but good workarounds are available.

Severity Analysis

Now that we have more guidance on identifying the severity of an incident, how should it be reported? I believe that anyone can report an incident, bug, something not working, but it is up to an analyst to determine the severity level of the report.

So, the first step is for the person who discovered the issue to open a ticket. Of course if it is a customer and we don’t have a self-support system, they will probably report it to an employee in support or sales and the employee will create the ticket for the customer. All tickets should be auto routed to the analyst team where it is assigned to an analyst to triage. The analyst will assign the severity level and assign to engineering support where the ticket will be reviewed, discussed and prioritized. The analyst in this instance can be a QA, BA, even a developer assigned to the task, but the point is to have a dedicated team/person responsible.

During the analysis, a time line of the the failure should be established. What led up to the failure, the changes, actions taken, and people involved should all be laid out in chronological order. Also, during triage, a description of how to recreate the failure should be written if possible. The goal is to collect as much information about the failure as possible in one place so that the team can review and help investigate. Depending on the Sev level various degrees of details and speed in which feedback is given should be established.

Conclusion

This is turning out to be a lot deeper than I care to dive into right now, but this gives me food for thought. My take aways so far are to

  • formalize severity levels
  • define how to identify the levels
  • assign someone to do the analysis and assign the levels