Category: DevOps

Observable Resilience with Envoy and Hystrix works for .NET Teams

We had an interesting production issue where a service decided to stalk a Google API like a bad date and incured a mountain of charges. The issue made me ponder the inadequate observability and resilience we had in the system. We had resource monitoring through some simple Kubernetes dashboards, but I always wanted to have something more robust for observability. We also didn’t have a standard policy on timeouts, rate limiting, circuit breaking, bulk heading… resilience engineering. Then my mind wandered back to a video that I thought was amazing. The video was from the Netflix team and it altered my view on observability and system resilience.

I was hypnotized when Netflix released a view of the Netflix API Hystrix dashboard – There is no sound in the video, but for some reason this dashboard was speaking loudly to me through the Matrix or something, because I wanted it badly. Like teenage me back in the day wanting a date with Janet Jackson bad meaning bad.

Netflix blogged about the dashboard here – The simplicity of a circuit breaker monitoring dashboard blew me away. It had me dreaming of using the same type of monitoring to observe our software delivery process, marketing and sales programs, OKRs and our business in general. I saw more than microservices monitoring I saw system wide value stream monitoring (another topic that I spend too much time thinking about).

Unfortunately, when I learned about this Hystrix hotness I was under the impression that the dashboard required you to use Hystrix to instrument your code to send this telemetry to the dashboard. Being that Hystrix is Java based, I thought it was just another cool toy for the Java community that leaves me, .NET dev, out in the cold looking in on the party. Then I got my invitation.

I read where Envoy (on my circa 2018 cool things board and the most awesome K8s tool IMHO), was able to send telemetry to the Hytrix dashboard – This meant we, the .NET development community, could get similar visual indicators and faster issue discovery and recovery, like Netflix experienced, without the need to instrument code in any container workloads we have running in Kubernetes.

Install the Envoy sidecar, configure it on a pod, send sidecar metrics to Hystrix Dashboard and we have deep observability and a resilience boost without changing one line of .NET Core code. That may not be a good “getting started” explanation, but the point is, it isn’t a heavy lift to get the gist and be excited about this. I feel like if we had this on the system, we would have caught our Google API issue a lot sooner than we did and incurred less charges (even though Google is willing to give one-time forgiveness, thanks Google).

In hindsight, it is easy to identify how we failed with the Google API fiasco, umm.. my bad code. We’re a blameless team, but I can blame myself. I’d also argue that better observability into the system and improving resilience mechanisms has been a high priority of mine for this system. We haven’t been able to fully explore and operationalize system monitoring and alerts because of jumping through made up hoops to build unnecessary premature features. If we spent that precious time building out monitoring and alerts that let us know when request/response count has gone off the rails, if we implemented circuit breakers to prevent repeated requests when all we get in response are errors, if we were able to focus on scale and resilience instead of low priority vanity functionality, I think we’d have what we need to better operate in production (but this is also biased by hindsight). Real root cause – our poor product management and inability to raise the priority of observability and resilience.

Anyway, if you are going to scale in Kubernetes and are looking for a path to better observability and resilience, check out Envoy, Istio, Ambassador and Hystrix, it could change your production life. Hopefully, I will blog one day about how we use each of these.

Visual Studio, DotNet Core, Windows and Docker, a Match Made in Heaven


If you haven’t heard about Docker, catch up. Out the gate, the best reasons for me to use Docker is being able to run a production like environment locally and being able to instantly create new test environments without having to go through the tiring dance of manual configuration. So, each test deployment gets a brand new fresh environment and the environment can be thrown away after testing is done. So, Docker delivers on “infrastructure as code” and allows you to save your Docker container configuration in source control and iterate it along with the source code. There are other reasons, but they had me at “create new test environments without having to go through the tiring dance…”

With Windows Server 2016 having Docker support, it’s time to get on board. It has been possible to run ASP.NET/.NET Core on Linux, but I have been waiting to be able to do this on Windows. So, here we go.


Create New .NET Core Solution

  • Create ASP.NET Core Web Project (.NET Core)
  • Use the ASP.NET Core 1.1 Template for WebAPI
  • Opt to Enable Docker Support

Use Existing .NET Core Solution

  • Open an existing .NET Core solution
  • Right click on the Web Project in the Solution Explorer and click “Add Docker Support”

Docker Support

When you Enable Docker Support a Docker Compose project is created. The Docker Compose project is Visual Studio’s tooling to manage the creation of Docker containers. It has two YAML files:

  • – configures how the image is built and ran.
  • docker-compose.yml – configures the image to be built and ran. It also has nested files that allow you to override the configuration for debug and release (similar to web.config transforms).


Debugging is as simple as debugging traditional web projects. To give a little background on what’s happening behind the scenes, when you F5/debug your Docker containerizes the application (is that a word)

  • Visual Studio runs docker compose
  • The image defined in the is downloaded if not in cache
  • ASPNETCORE_ENVIRONMENT is set to Development inside the container
  • Port 80 is explosed and mapped to a dynamically assigned port for localhost. The Docker host controls the dynamic port. You can see the container port with the Docker CLI command to list containers:
docker ps
  • The application is copied from the build output folder into the continer
  • The default browser is launched with the debugger attached to the container.

The application is now running and you can run docker ps to see some of the properties of the running container. The dev image that was built does not contain the application, rather it is mounted from drive we shared with the containers during install. This is to allow you to iterate and make changes while developing without having to to go through the expense of writing the application in the image.

If you make changes to a static file, they are automatically updated without having to recompile. You do have to refresh the browser, so it’s not as lovely as client side development, but better than stopping and starting the container.

If you make changes to a compiled file, hit Ctrl-F5 and the application is compiled and the Kestrel web server is restarted in the container. The container doesn’t have to be rebuilt or stopped, because we don’t have to rebuild the application in the image (hence the reason for the empty dev image).


When you are ready to release the application to Docker Hub or your private hub, you create a production image of the application. When you build in release mode the application is actually copied into the image and unlike dev images, future changes have to be re-imaged.


This was a very pleasing development experience. I had no issues at all (knock on wood) and I was debugging a running .NET Core application in a Docker container locally on Windows 10  in less than 15 minutes (not including install time). I’m still very new at this new tooling and Windows support, so I hope I will get to write more about it as I hit road blocks. There are always road blocks and I’m sure that I will hit them when I answer some of my current questions:

  • How to automate and integrate in continuous delivery pipeline
    • Container build and publish
    • Deploying container and running application to Windows Server 2016 on premise and in Azure
  • How to run multiple load balanced containers
  • How to monitor containers
  • How to deploy more containers to handle increase load or failover
  • How to use as base of Microservices architecture
  • And the questions keep coming…


How do you do to spell relief?

How do you handle Sev 1 critical outages?

Stay calm

Sev 1 outages are stressful. When production is down and customers are affected and everyone is looking at your team for an answer, first and foremost stay calm… breath. This is easier said than done when you are affecting millions of dollars per hour in transactions (which is a thing in large scale payment systems, not fun), but regardless of the impact of the outage, if you loose your cool, the solution can be sitting right in front of your face and you won’t see it. Shit happens, there will always be bugs, sites will always go down at some point, accept it, find the solution and focus on not letting the same shit happen twice.

Don’t focus on blame

Establishing who caused the issue is not important? Knowing who may have been involved in the changes that led up to an issue and made changes after the issue is important to understand. Even if there may have only been one person involved, you can’t assume that they are the cause and it does no good to blame anyone when production is down. Focus on the solution.

Create time line of events

You should document all relevant changes that led up to the outage and all changes that occurred after the outage. This not only helps to discover possible causes it provides documentation that can be used during root cause analysis and investigations during similar outages.

The time line can be kept on an internal team wiki so the team has visibility and can add to it as necessary. During an outage, someone should be assigned to record all of the facts in the time line. Without facts your poking at the problem in the dark.

Never theorize before you have data. Invariably, you end up twisting facts to suit theories, instead of theories to suit facts.
Sherlock Holmes to Watson (Movie – Sherlock Holmes 2009)

Investigate logs

I can’t tell you how many times that checking the logs first would have saved a lot of time as opposed to just poking around looking for file changes, config changes… everything else, but simply looking at the logs. The first step in investigating the issue should be looking at all of the relevant logs: event logs, custom application logs…

Communication is key

Keep a bridge line open with team

Keeping a bridge line open, even if there is nothing to discuss, keeps a real-time line of communication open and ready when someone has questions, ideas, and possible solutions.

Send regular status updates to team and stakeholders

Sending a message to announce the issue and what is known right now is good form. It lets everyone know that you are on top of the issue and working hard to solve it. If you haven’t found a resolution in a certain amount of time, sending another update explaining what has been done and any new findings lets everyone know that although you haven’t found the issue, you are still working hard on it. It may be a good idea to even post the status updates to a blog or Twitter, syndicate the updates to as many channels as you can, especially if you have a large application with many users.

Staying proactive with communication is much better than constantly having to field random calls and emails looking for information you should be readily sharing. Keep communications open and don’t try to hide, spin, or lie about the mistake.

No one makes any changes without discussing the change

While everyone is trying to solve the issue, no one should be making changes in production, even if the fix is blatantly obvious. A Sev 1 is serious and everything changed to fix it should be discussed with the team first so it can be documented and controls put in place to prevent it in the future.

If the team agrees on the change then the change should be documented on the timeline and a notification should be sent when the change is starting and when the change is finished. The change discussion and notifications can be simply talking it out over the bridge line or an IM or email. The point is don’t allow the change to get worse or be repeated by making undocumented changes that the team can’t learn from.



These are just some tips that I have learned over the years. I have seen many more sound practices, but the gist is:

  • Stay calm
  • Document changes to production
  • Work as a team
  • Learn from failure

Extending the Reach of QA to Production

I have multiple lingering tasks for improving monitoring for for our applications. I believe this is a very important step we need to take to assess the quality of our applications and measure the value that we are delivering to customers. If I had my way, I would hire another me just so I can concentrate on this.


We need to monitor usage to better understand how our customer actually use the application in production. This will allow us to make better product design decisions and optimizations, prioritize testing effort in terms of regression coverage, and provide a signal for potential issues when trends are off.


We need a better way to monitor and analyze errors. We currently get an email when certain exceptions occur. We also log exceptions to a database. What we don’t have is a way to analyze exceptions. How often do they occur, what is the most thrown type of exception, what was system health when the exception was thrown.


We need a way to monitor and be alerted of health issues (e.g. current utilization of memory, cpu, diskspace; open sessions; processing throughput…). Ops has a good handle on monitoring, but we need to be able to surface more health data and make it available outside of the private Ops monitoring systems. It’s the old “it takes a village to raise an app” thing being touted by the DevOps movement.


Everyone on the delivery team needs some access to a dashboard where they can see the usability, exceptions, health of the app and create and subscribe to alerts for various condition thresholds that interest them. This should be even shared with certain people outside of delivery just to keep things transparent.


This can all be started in preproduction and once we are comfortable with it pushed to production. The point of having it is that QA is a responsibility of the entire team. Having these types of insight into production is necessary to insure that our customers are getting the quality they signed up for. When the entire team can monitoring production it allows us to extend QA because we can be proactive and not just reactive to issues in production. Monitoring production gives us the ammo we need to take preemptive action to avert issues in production while giving us the data we need to improve the application.

Monitoring Change Tickets in Delivery Pipelines

DevOps sounds cool like some covert special operations IT combat team, but it is missing the boat in many implementations because it only focuses on the relationship between Dev and Ops and is usually only championed by Ops. The name alienates important contributors on the software delivery team. The team is responsible for software delivery including analysis, design, development, build, test, deploy, monitoring, and support. The entire team needs to be included in DevOps and needs visibility in to delivery pipelines from end-to-end. This is an unrelated rant, but this lead me to thinking about how a delivery team can monitor changes in delivery pipelines.

Monitor Change

I believe it is important that the entire team be able to monitor changes as they flow through delivery pipelines.. There are ticket management systems that help capture some of the various stages that a change goes through, but its mostly various project management related workflow stages and they have to be changed manually. I’d like a way to automatically monitor a change as if flows from change request all the way to production and monitor actions that take place outside of the ticket or project management system.

Normally, change is captured in some type of ticket maybe in a project management system or bug database (e.g. Jira, Bugzilla). We should be able to track various activities that take place as tickets make their way to production. We need a way trace various actions on a change request back to the change request ticket. I’d like a system where activities involved in getting a ticket to production automatically generate events that are related to ticket numbers and stored in a central repository.

If a ticket is created in Jira, a ticket created event is created. A developer logs time on a ticket, a time logged activity event is created that links back to the time log or maybe holds data from the time log for the ticket number.

When an automated build that includes the ticket happens, then a build stated activity event is created with the build data is triggered. As various jobs and tasks happen in the automated build a build changed activity event is triggered with log data for the activity. When the build completes a build finished activity event is triggered. There may be more than one ticket involved in a build so there would be multiple events with similar data captured, but hopefully changes are small and constrained to one or a few tickets… that’s the goal right, small batches failing fast and early.

We may want to capture the build events and include every ticket involved instead of relating the event directly to the ticket, not sure; I am brainstorming here. The point is I want full traceability across my software delivery pipelines from change request to production and I’d like these events stored in a distributed event store that I can project reports from. Does this already exists? Who knows, but I felt like thinking about it a little before I search for it.

Ticket Events

  1. Ticket Created Event
  2. Ticket Activity Event
  3. Ticket Completed Event

A ticket event will always include the ticket number and a date time stamp for the event, think Event Sourcing. Ticket created occurs after the ticket is created in the ticket system. Ticket completed occurs once the ticket is closed in the ticket system. The ticket activities are captured based on the activities that are configured in the event system.

Ticket Activity Events

A ticket activity is an action that occurs on a change request ticket as it makes its way to production. Ticket activities will have an event for started, changed, and finished. Ticket activity events can include relevant data associated with the event for the particular type of activity. There may be other statuses included in each of these ticket activity events. For example a finish event could include a status of error or failed to indicate that the activity finished but it had an error or failed.

  • {Ticket Activity} Started
  • {Ticket Activity} Changed
  • {Ticket Activity} Finished

Deploy Started that has deploy log, Build Finished that has the build log, Test Changed that has new test results from an ongoing test run.

Maybe this is overkill? Maybe this should be simplified where we only need one activity event per activity and it includes data for started, changed, finished, and other statuses like error and fail. I guess it depends on if we want to stream activity event statuses or ship them in bulk when an activity completes; again I’m brainstorming.


Every ticket won’t have ticket activity events triggered for every activity that the system can capture. Tickets may not include every event that can occur on a ticket. Activity events are triggered on a ticket when the ticket matches the scope of the activity. Scope is determined by the delivery team.

Below are some of the types of activity events that I could see modeling for events on my project, but there can be different types depending on the team. So, ticket activity events have to be configurable. Every team has to be able to add and remove the types of ticket activity events they want to capture.

  1. Analysis
    1. Business Analysis
    2. Design Analysis
      1. User Experience
      2. Architecture
    3. Technical Analysis
      1. Development
      2. DBA
      3. Build
      4. Infrastructure
    4. Risk Analysis
      1. Quality
      2. Security
      3. Legal
  2. Design
  3. Development
  4. Build
  5. Test
    1. Unit
    2. Integration
    3. End-to-end
    4. Performance
    5. Scalability
    6. Load
    7. Stress
  6. Deploy
  7. Monitor
  8. Maintain

Reporting and Dashboards

Once we have the events captured we can make various projections to create reports and dashboards to monitor and analyze our delivery pipelines. With the ticket event data we can also create reports at other scopes. Say we want to report on a particular sprint or project. With the ticket Id we should be able to gather this and relate other tickets in the same project or sprint. It would take some though as to whether we would want to capture project and sprint in the event data or leave this until the time when we make the actual projection, but with ticket Id we can expand our scope of understanding and traceability.


The main goal with this exploration into my thoughts on a possible application is to explore a way to monitor change as it flows through our delivery pipelines. We need a system that can capture the raw data for ticket create and completed events and all of the configured ticket activity events that occur in between. As I look for this app, I can refer to this to see if it meets what I envisioned or if there may be a need for this.

Why Would You Have a Pipeline with No Materials?

So, I need to run a Go pipeline that isn’t dependent on a material, meaning Go’s out the box implementation of a material. What I’m talking about is a dumb hack to get a production deployment working in an environment with stringent security policies. We have two Go servers. One orchestrates pipelines in our pre-production environments. Because of the security concerns we have a second Go server just to deploy production. When a package has gone through all of the quality gates in pre-production it is ready to deploy to production and the deployable packages are placed in an artifact repository (right now a file share) that is accessible by the production Go server.

When we want to deploy production, we manually trigger a pipeline on the production Go server that kicks off a job to get the deploy package from the package repository and place it in the production Go server’s artifact repository. With the package in the production, all of the production deploy agents have access to deploy it. Normally, this “get files from a share” business would be handled by a material setup in Go, but I couldn’t find one that could poll a file share and sticking these binaries in a source repo that was accessible to the preproduction and production domains sounded like overkill at the time.

What would be even better and make me fall in love with Go even more would be if two server could talk to each other then I could just have the production Go server poll the preprod server to see if it has a new package in some pipeline. I guess I could

  • do a little Java code and do some type of file share repository poller
  • setup Artifactory or NuGet and use the pollers already build for them
  • explore the Go API and see if I could do a service to have the servers talk to each other through the API
  • spend a little more time on Google and Github to see if this has already been solved

Because I’m not fluent in Java, much of this felt like a trip down the rabbit hole and I was already shaving a Yak with trying to get this supposedly quick hack up. Yet, what I did was the simplest thing I could think of to get going.

Anyway, I finally set up a fake git repo and ignored that material I setup to point to it. First setting up Git: initializing a repo, creating a bare repo, cloning the initialized repo into the bare repo and configuring the bare repo as a material in Go. Then on the Stage config there is an option for “Fetch Materials”. Go will only do material updates and checkouts if this is checked, so I unchecked it. On the Material config there is an option for “Poll for new changes”. This will stop Go from polling the material for changes, but you still have to set one up. It has to be a valid material. You can’t just put “myfakegitrepo” in the URL, I tried… it failed.

So, although you can’t get away with not using a material you can make the material insignificant after it is setup. I hope this doesn’t help anyone, if it does you are probably doing something complex and wrong like me.

Confirming MSBuild Project Dependency Build Behavior

So we have one giant Visual Studio solution that builds every application project for our product. It is our goal to one day be able to build and deploy each application independently. Today, we have a need to build one project for x86 and all others as x64. This requirement also provides a reason to explore per application build and deploy. The x64 projects and x86 project share some of the same dependencies and provides a good test for per application build. The purpose of this exploration is to determine the best way to automate the separate platform builds and lay groundwork for per application build. These are just my notes and not meant to provide much meat.

First, I will do some setup to provide a test solution and projects to experiment with. I created 4 projects in a new solution. Each project is a C# class library with only the default files in them.

  • ProjA
  • ProjB
  • ProjC
  • ProjD

Add project dependencies

  • ProjA > ProjB
  • ProjB > ProjC, ProjD
  • ProjC > ProjD

Set the platform target (project properties build tab) for each project like so

  • ProjA x64
  • ProjB x64
  • ProjC x64
  • ProjD x86

Behavior when a Dependent Project is Built

Do a Debug build of the solution and inspect the bin folders.

  • BuildTest\ProjA\bin\Debug
    • ProjA.dll               10:03 AM             4KB        B634F390-949F-4809-B937-66069C5F058E              v4.0.30319 / x64
    • ProjA.pdb             10:03 AM             8KB
    • ProjB.dll                10:03 AM             4KB        B0C5B475-576D-44D2-BD41-135BDA69225E          v4.0.30319 / x64
    • ProjB.pdb             10:03 AM             8KB
  • BuildTest\ProjB\bin\Debug
    • ProjB.dll               10:03 AM             4KB        B0C5B475-576D-44D2-BD41-135BDA69225E          v4.0.30319 / x64
    • ProjB.pdb            10:03 AM             8KB
    • ProjC.dll               10:03 AM             4KB        DBB9482F-6609-4CA5-AB00-009473E27CDA          v4.0.30319 / x64
    • ProjC.pdb            10:03 AM             8KB
    • ProjD.dll               10:03 AM             4KB        4F0F7877-5046-4A32-8B8E-FAD8E2660CE6            v4.0.30319 / x86
    • ProjD.pdb            10:03 AM             8KB
  • BuildTest\ProjC\bin\Debug
    • ProjC.dll               10:03 AM             4KB        DBB9482F-6609-4CA5-AB00-009473E27CDA          v4.0.30319 / x64
    • ProjC.pdb            10:03 AM             8KB
    • ProjD.dll               10:03 AM             4KB        4F0F7877-5046-4A32-8B8E-FAD8E2660CE6            v4.0.30319 / x86
    • ProjD.pdb            10:03 AM             8KB
  • BuildTest\ProjD\bin\Debug
    • ProjD.dll              10:03 AM             4KB        4F0F7877-5046-4A32-8B8E-FAD8E2660CE6            v4.0.30319 / x86
    • ProjD.pdb           10:03 AM             8KB

Do a Debug rebuild of ProjA

  • ProjA all DLLs have new date modified.
  • ProjB all DLLs have new date modified.
  • ProjC all DLLs have new date modified.
  • ProjD all DLLs have new date modified.

Do a Debug build of ProjA

  • ProjA no DLLs have new date modified.
  • ProjB no DLLs have new date modified.
  • ProjC no DLLs have new date modified.
  • ProjD no DLLs have new date modified.

Change Class1.cs in ProjA and do a Debug build of ProjA

  • ProjA: ProjA.dll has new data modified, ProjB.dll does not.
  • ProjB no DLLs have new date modified.
  • ProjC no DLLs have new date modified.
  • ProjD no DLLs have new date modified.

Change Class1.cs in ProjB and do a Debug Build of ProjA

  • ProjA all DLLs have new date modified.
  • ProjB: ProjB.dll has new data modified, ProjC.dll and ProjD do not.
  • ProjC no DLLs have new date modified.
  • ProjD no DLLs have new date modified.

We change Class1.cs in ProjC and do a Debug Build of ProjA

  • ProjA all DLLs have new date modified.
  • ProjB: ProjB.dll and ProjC.dll have new data modified, and ProjD does not.
  • ProjC ProjC.dll has new date modified, and ProjD does not.
  • ProjD no DLLs have new date modified.

We change Class1.cs in ProjD and do a Debug Build of ProjA

  • ProjA all DLLs have new date modified.
  • ProjB all DLLs have new date modified.
  • ProjC all DLLs have new date modified.
  • ProjD all DLLs have new date modified.


  • If a dependency has changes it will be built when the dependent project is built.

Behavior When Project with Dependencies is Built

Next, I want to verify the behavior when a project that has dependencies is built.

Clean the solution and do a debug build of the solution.

Do a Debug build of ProjD

  • ProjA no DLLs have new date modified.
  • ProjB no DLLs have new date modified.
  • ProjC no DLLs have new date modified.
  • ProjD no DLLs have new date modified.

We change Class1.cs in ProjD and do a Debug build of ProjD

  • ProjA no DLLs have new date modified.
  • ProjB no DLLs have new date modified.
  • ProjC no DLLs have new date modified.
  • ProjD all DLLs have new date modified.

We change Class1.cs in ProjC and do a Debug Build of ProjD

  • ProjA no DLLs have new date modified.
  • ProjB no DLLs have new date modified.
  • ProjC no DLLs have new date modified.
  • ProjD no DLLs have new date modified.


  • If a project with dependencts is built, any projects that depend on it will not be built.

Behavior When Bin is Cleaned

I manually deleted DLLs in ProjD and built ProjA and the DLLs with same date modified reappeared. Maybe they were fetched from obj folder.

I do a clean on ProjD (this cleans obj) and build ProjA and new DLLs are added to ProjD.


  • Obj folder acts like a cache for builds.

Behavior when External Dependencies are Part of Build

Add two new projects to solution

  • ExtA x64 > ExtB
  • ExtB x86

Updated these projects so they output to the solution Output/Debug folder.

Added references to the ExtA and ExtB output DLLs

  • ProjA > ExtA
  • ProjB > ExtB

I did a solution rebuild and I noticed something that may also be a problem in other tests. When building ProjC, ProjD, and ExtA we get an error:

warning MSB3270: There was a mismatch between the processor architecture of the project being built “AMD64” and the processor architecture of the reference. This mismatch may cause runtime failures. Please consider changing the targeted processor architecture of your project through the Configuration Manager so as to align the processor architectures between your project and references, or take a dependency on references with a processor architecture that matches the targeted processor architecture of your project.

Also, ProjA and ProjB are complaining about reference resolution:

warning MSB3245: Could not resolve this reference. Could not locate the assembly “ExtA”. Check to make sure the assembly exists on disk.

In Visual Studio I update the dependency for ProjA and ProjB to include the Ext projects. This fixes the MSB3245 error.


  • We need to build all dependencies with the same platform target as the dependent.
  • We need to build external references before building any dependents of the external references (e.g. get NuGet dependencies).
  • When a solution contains a project that is dependent on another project, but does not have a project reference, update the dependency to force the dependency to build first.

Separating Platform Builds

Add new Platforms for x64 and x86. Update configuration so each project can do a x86 and x64 build. Have Ext project output to x86 and x64 folders for release and debug builds.

Add new projects for ExtC and ExtD and have respective Proj reference their release output. ProjC should ref ExtC x64 release and ProjD should ref ExtD x86 release.

Issue Changing Platform on Newly Add Solution Projects

So, I am unable to change platform target for ExtD/C as x86 and x64 do not appear in drop down and I can’t add them because UI says they are already created. I manually add them to project file.


<PropertyGroup Condition="'$(Configuration)|$(Platform)' == 'Debug|x64'">
  <PropertyGroup Condition="'$(Configuration)|$(Platform)' == 'Release|x64'">
  <PropertyGroup Condition="'$(Configuration)|$(Platform)' == 'Debug|x86'">
  <PropertyGroup Condition="'$(Configuration)|$(Platform)' == 'Release|x86'">

No I can update the output path for ExtC/D and update Configuration Manager to proper platform.

Since ProjD is exclusively an x86 projects I removed them from the build for x64. I have ExtD building both x86 and x64. I updated dependencies so ExtC/D build before ProjC/D.

Final Conclusion

This was a bit much to verify what may be common knowledge on the inter-webs, but I wanted to see for myself. There is more that I want to experiment with like NuGet, build performance and optimization, but this gave me enough to move forward with an initial revamp of our automated build. I am going to proceed with a separate pipeline for the an x86 build of the entire solution and a separate deploy for the x86 application. I really believe that going forward that NuGet can become a key tool in standardizing and optimizing per application build and deployment.


  • If a dependency has changes it will be built when the dependent project is built. Otherwise it will not be rebuilt. This is all a feature of MSBuild. When we move to per application build we will have to build separate solutions for each application and build in isolated pipelines. To prevent conflicts with other applications, we should build dependencies that are shared across applications in a separate pipeline.
  • If a project with dependents is built, any projects that depend on it will not be built. If we
  • Obj folder acts like a cache for builds. We can extend this concept to a common DLL repo where all builds send their DLLs, but we would need a confident way of versioning the DLLs so that we always use the most recent or specific version… sounds like I am proposing NuGet (smile).
  • We need to build all dependencies with the same platform target as the dependent. We may build the main solution as x64 and build other projects that need x86 separately. I believe this would be the most efficient since the current x86 projects will not change often.
  • We need to build external references before building any dependents of the external references (e.g. get NuGet dependencies). We do this now with NuGet, we fetch packages first, but when we move to per application build and deploy this will automatically be handled by’s fan in feature. We will have application builds have pipeline dependencies on any other application builds it needs. This will cause the apps to always use the most recent successful build of an application. We can make this stronger by having application depend on test pipelines to insure the builds have been properly tested before integration.
  • When a solution contains a project that is dependent on another project, but does not have a project reference, update the dependency to force the dependency to build first.


This is actually a re-post of a post I did on our internal team blog. One comment there was we should also do an Any CPU build then we can have one NuGet will all versions.

Thoughts on DevOps

I am not a DevOps guru. I have been learning DevOps and Continuous Improvement for about 6 years now. I wanted to blog about some of what I have learned because I see companies doing it wrong. I wanted to start internalizing some of the lessons I have learned and the grand thoughts I have had just in case someone asks me about DevOps one day.

DevOps is a Religion

I’m not going to define DevOps because there is enough of that going on ( I will say that you can’t hire your way to DevOps because it isn’t a job title. You can’t have one team named DevOps and declare you are doing DevOps. Everyone on your application delivery teams have to convert to DevOps. When you only have one team enabling some DevOps practices through tools and infrastructure you are only getting a piece of the DevOps pie. Until you have broken down the silos and increased communication you haven’t realized DevOps.

Do not focus on implementing DevOps by creating another silo in a “DevOps” team. You can create an implementation team that focuses on DevOps processes, tools, and infrastructure, but if this will be a long lived team call them a Delivery Systems team or Delivery Acceleration team and make sure they are embedded in sprint teams and not off in some room guarded by a ticket system. As with some religions, you have to congregate. Your delivery team has to communicate with each other outside of tickets and email.

When you name the team DevOps it pushes responsibility for DevOps to that team, but the byproduct of DevOps is the responsibility of the entire delivery team. This is the same problem with a QA team, your QA team is not responsible for quality, the entire delivery team is responsible for quality. When you have silos like these, it is hard to get a “One Delivery Team” mindset. Find ways to break down silos, then you won’t be one of those companies that missed the DevOps boat because you couldn’t get your new silo’d DevOps team to delivery on the promises of DevOps.

Fast Feedback is a Main By Product

One of the main benefits of doing continuous anything (DevOps includes continuous improvement processes), is you get fast feedback. The tighter, faster your feedback loops the faster you can iterate. Take a small step, get feedback, adjust based on the feedback, and iterate. It’s not rocket science, its simplification. Work in smaller batches, talk about how to make the next batch better; watch your automation pipelines and KPIs, talk about how to make your pipelines and KPIs better… TALK.

Collaboration is the Key that Unlocks the Good Stuff

Having the entire delivery team involved and talking is key. The Business, QA, Security, IT, Operations, Development… everyone must communicate to insure the team delivers the value that end users are looking for. Give end users value, they give the business value, loop. Having a delivery team that huddles in their silos with minimum communication with other teams is a good way to short circuit this loop. DevOps is a way of breaking down the silos and improving collaboration. DevOps is not the best name to convey what it can deliver. Just remember that the DevOps way should extend beyond the development and operations team.

Automation is the Glue that Binds Everything

Having an automated delivery pipeline from source check-in to production enables you to have a repeatable delivery process that is capable of automatically providing fast feedback. It gives the entire team a way to start and stop the pipeline and monitor the pipeline to adjust based on feedback from the pipeline. It also aides in collaboration by providing dashboards and communication mechanisms accessible by the entire delivery team.

If you have no automation, start with automating your build on each check-in. Then automate running of unit tests, then deployment to a test environment, running automated functional tests, deploy to the next environment. Don’t forget virtualization. Figure out how you can virtualize your environments and automate the provisioning of an environment to run your apps in. Start where you are and focus on adding the next piece until you can automatically build once and deploy and test all the way to production. Iterate your way continuous delivery.

Virtualization is Magic Pixie Dust

Many people I have asked think of the DevOps as virtualization and automated server configuration and provisioning. Even though this isn’t everything in DevOps, it’s a big part of it. Being able to spin up a virtual environment to run a test removes environments as a hindrance to more testing. Being able to spin up a virtualized mock environment for a third party service that is not ready allows us to test in spite of the missing dependency. Virtualization in production allows us to hot swap the current environment with a new one when we are ready for the next release or when production nodes are being hammered or being otherwise unruly. Codifying all of this virtualization allows us to treat our infrastructure just like we do product code. We can manage changes in a source control repository and automatically run the infrastructure code as part of our delivery process.

Quality, Security and Health Come First

Before one line of code is written on a change, an analysis of the desired change must be done before delivering it. I’m not saying a large planning document has to be done. The team has to talk through the potential effect on quality, security and health (QSH) and it makes sense to record these discussions somewhere to be used during the iteration. You can create a doc or record it in a ticket, but QSH must be discussed and addressed during the iteration.

QSH is not something that happens after development has declared code complete. It should happen in parallel with development. There should be automated unit, integration and end-to-end checks. There should be automated static analysis and security checks. A load test and analysis of health monitors should be measuring how the applications is responding to changes. This all should happen during development iterations or as close to development as possible.

On a side note, in Health I am lumping performance, scale, stress and any type of test where a simulated load is tested against the application. This could be spinning up a new virtualized environment, running automated tests then turning off the database or a service to see what happens. Health is attempting to introduce scenarios that will give insight into how the application will respond to changes. It may take a lot to get to the level of Netflix and its chaos monkey in production, but having infrastructure and tests in preproduction to measure health will give you something instead of being totally blind to health issues.


I know there is no real meat here or guidance on how to do these things, but that’s what Google is for or read Gene Kim’s the Phoenix Project. Anyway, I may be a little naive on a few points, but the gist is DevOps is more than a job or team title, its more than development and operations signing a peace treaty, more than automated server configuration. Think of it as another step in improving your continuous improvement process with a focus on cross team collaboration where you break down the silos separating all of the teams that deliver your application.

Cross Domain PowerShell Remoting [Fail]

I tried to run our PowerShell environment configuration scripts today and got hit with a nasty error. I double checked my credentials so I know that wasn’t the issue. The scripts worked just a month ago, but we did have some stupid security software installed on our workstations that may be adjusting how remoting works. Let’s see if I can get around it before I open a ticket and start complaining.

Here is the error. This results from a simple call to New-PSSession. The other server is in another domain, but like I said this has been working just fine.

 New-PSSession : [agpjaxd1pciapp1] Connecting to remote server agpjaxd1pciapp1 failed with the following error message : WinRM cannot process the request. The following error with errorcode 0x80090311 occurred while using Kerberos authentication: There are currently no logon servers available to service the logon request.
 Possible causes are:
  -The user name or password specified are invalid.
   -Kerberos is used when no authentication method and no user name are specified.
   -Kerberos accepts domain user names, but not local user names.
   -The Service Principal Name (SPN) for the remote computer name and port does not exist.
   -The client and remote computers are in different domains and there is no trust between the two domains.
  After checking for the above issues, try the following:
   -Check the Event Viewer for events related to authentication.
   -Change the authentication method; add the destination computer to the WinRM TrustedHosts configuration setting or use HTTPS transport.
  Note that computers in the TrustedHosts list might not be authenticated.
    -For more information about WinRM configuration, run the following command: winrm help config. For more information, see the about_Remote_Troubleshooting Help topic.

After I read this, I just stared at this for about 5 minutes; deer in the head lights.

I found some hope on the PowerShell Scripter’s friend, “Hey Scripting Guy” blog –

Anyway, the solution from Honorary Scripting Guy, Richard Siddaway was to add the computer I am connecting to the the trusted host list. The trusted host list basically tells your computer, “Hey, you can trust this computer, go ahead and share my sensitive and private credentials with the.” So, be careful with this.

You can view the trusted host list with this PowerShell command.

Get-Item -Path WSMan:\localhost\Client\TrustedHosts

You can add a computer to the trusted list with this command.

Set-Item -Path WSMan:\localhost\Client\TrustedHosts -Value 'computerNameOfRemoteComputer'
[Y] Yes  [N] No  [S] Suspend  [?] Help (default is "Y"): Y

Now, I run the configuration script and I am deer in the head lights again.

New-PSSession : Opening the remote session failed with an unexpected state. State Broken.

Such a helpful error message. Stackoverflow – Looks like it may be a timeout, and I’m feeling that because the script sat on “Creating Session” forever (why so long is probably the next question). I update my script to increase timeout.

$so = New-PSSessionOption -IdleTimeout 600000
$Session = New-PSSession -ComputerName $node.ComputerName -Credential $credential -SessionOption $so;

10 minute timeout is good right? So, I try again and State is still Broken. Not mission critical at the moment so I will investigate further later.

You can read more about possible solutions at the links above.

GoCD: Install Multiple Agents with Powershell, Take 2

I wrote about how to Automate Agent Install with PowerShell and thought I would provide the script I am using now since I recently had to deploy some new agents. The script is below and it is pretty self explanatory and generally follows my previous blog post and the documentation.

We basically, copy an existing agent to a new location, remove some files that are agent specific, and create a Windows service to run the agent. Until I feel the pain of having to do it, I set the service account/password and start the service manually. Also, I configure the agent on the server manually through the UI. When I have to install more agents I will probably automate it then.

$currentAgentPath = "D:\Go Agents\Internal\1";
$newAgentName = "Go Agent Internal 3";
$newAgentPath = "D:\Go Agents\Internal\3\";

Write-Host "Copying Files"
Copy-Item "$currentAgentPath\" -Destination $newAgentPath -Recurse;

Write-Host "Deleting Agent Specific Files"
$guidText = "$newAgentPath\config\guid.txt";

if (Test-Path $guidText)
 Remove-Item $guidText;

Remove-Item "$newAgentPath\.agent-bootstrapper.running";

Write-Host "Create Agent Service"
New-Service -Name $newAgentName -Description $newAgentName -BinaryPathName "`"$newAgentPath\cruisewrapper.exe`" -s `"$newAgentPath\config\wrapper-agent.conf`"";

#$credential = Get-Credential;
#Eventually, we will write a function to set the service account and password and start the service would be nice to have a way to automatically configure the agent on the server too.

I guess I decided to do the work for you 🙂