Category: Uncategorized

In SQL Null is not a value… not a value!

I have been spending a lot of time fixing SQL Server database errors caused by stored procedures attempting to compare null. If you don’t know, in SQL:

NULL = NULL is false

NULL <> NULL is false

Null is not a value. Null is nothing. You can’t compare nothing to nothing because there is nothing to compare. I know you can do a select and see the word NULL in the results in SQL Management Studio, but that is just a marker so you don’t confuse empty strings with NULL or something.

If you need to do a comparison on a nullable value please check that shit for null first:

t2.column2 is null or t2.column2 = t1.column2

t2.column2 is not null

Also, if you try to be smart and turn ANSI_NULLS off you are going to be hurt when you have to upgrade your SQL Server to a version that forces ANSI_NULLS on (it’s coming).

I have been guilty of comparing NULL and saying, “it has a NULL value.” Now that I am having to fix scripts written by someone who did think about NULL, I wanted to rant and hammer this point home for myself so I don’t cause anyone the pain I am feeling right now. Null is not a value… not a value!

.

Where is your logic?

RANT

I hate logic in the database. It’s hard to automate testing, hard to debug, hard to have visibility into logic that may be core to the success or failure of an application or business. Some of the worse problems I have had to deal with are database related, actually almost all of the worse problems have been linked to the database.

I am in love with the new movement to smaller services doing exactly one small thing very well. I think the database should persist data… period. Yes, there are times when it just makes sense to have logic closer to the data, but I can always think of a reason not to do it and it always goes back to my experiences with database problems. It’s been a love hate relationship, me and databases.

I’m not a DBA and I don’t have the reserve brain power to become one. So, to help my limited understanding I shy away from anything that looks like logic in my data layer. Call it lazy, naivete, or not wanting to use the right tool for the job, I don’t care. If I’m in charge get you shitty logic out of the database, including you evil MERGE statement and the current bane of my existence :).

KISS Your Big Data UI

It’s been so hard to blog lately. Mostly because I don’t have time to edit my posts. I guess if I wait until I have time to wordsmith better posts I’ll never post, so here is one that has been sitting on the shelf in all of its unedited glory.

What Qualifies A Developer To Talk About UI Design

I’m not qualified, I am not a designer. I haven’t done a lot of posts on the subject of UI design, but with the big push to big data, real-time streaming analytics, and IoT, I thought that I’d put a little thought into things that I would think about when designing a UI for them.

I started my tech career designing websites and desktop applications for a few years. Although my customers were happy with my UI designs, it’s not my thing and I don’t think I am good at it. Yet, I have been on many application teams and have had to work with many awesome designers. I believe that I can speak on UI design considerations from my 16 years of doing this. What I have to say is not gospel. I haven’t searched this stuff out on Bing like I do with engineering problems. Many UI and usability guru’s will probably crush me if they read this, but I’m not totally clueless when it comes to UI’s.

A Dashboard for Big Data

When dealing with designing an administrator’s dashboard for IoT device sensor data or maybe any big data application you should probably focus on making the right exceptional conditions highly visible and showing actionable data trends with the ability to drill down for more information and take necessary actions. The most important thing is to alert users of potential problems and anomalies that may indicate pending problems while providing some facility for taking action to investigate and mitigate problems. Just as important is being able to identify when something is going well because you want to learn from the successes to possibly apply the knowledge to other areas.

The UI should assist the user in being proactive in addressing problems. This is true if you have one device sending sensor data or a fleet of them. Granted there are differences in design consideration when you start scaling to 100s or 1000s of devices, but depending on the goal of the device, the basic premise is you want to make certain conditions identified by sensors up front and in your face.

KISS Your UI

When you have 100s of messages flowing from sensors compounded by multiple devices, a pageable grid of a hundred recent sensor messages on the first screen of the UI is useless, unless you are the type that enjoys trying to spot changes while scrolling the Matrix. The dashboard should lead you to taking action when problems exist, help you learn from successes, and give you peace of mind that everything is OK. The UI should help you identify potential areas to make improvements by uncovering weaknesses. This should be done without all of the noise from the mountain of data being held by the system.

If you could only show one thing on the UI what would it be. Maybe an alert box showing the number of critical exceptions triggered with a link to view more information? Start with that one thing and expand on it to provide the user with what they need. Big data UI is not a CRUD or basic application UI. It more closely related with what one might do for a reporting engine UI, but even that is a stretch. I am sure there are awesome blogs and books out there that speak on this subject, but many of the UIs I have been seeing were designed by people that didn’t get a subscription or something.

Keep It Simple Stupid (KISS) is as much a design principle as it is in software engineering. Stop making people work to understand thousands of data points when it should be the job of the UI designer to simplify it for them.

Example

Say you have a storage company. You have hundreds of garages and you want to deploy sensors to each garage and allow your customers to monitor them. The sensor will give you data on the door being open or closed, temperature, and relative humidity in the unit. The devices send messages every 5 minutes, 7,200 messages a day.

Your customers can opt in to SMS and email alerts on each data point. Some will only want open/close alerts. Some may have items that are sensitive to heat and humidity and they will want alerts when temperature or humidity cross some threshold.

Your customers also get access to a website that allows them to modify alerts and view a dashboard that allows them to investigate and query all of the sensor data. What good would it be to have a grid on the dashboard showing sensor data streaming into the UI every 5 minutes, 7,200 times a day. Why burden them with even having to see the data. Most customers will never have a breach. Temp and humidity sensitive customers will probably have an environment controlled unit that rarely triggers and alert. What does streaming data on the initial dashboard screen give them… nothing.

The only thing most of the customers want to know is has someone opened my unit or if the temp is fluctuating. The customer wants us to simplify all of that data into a simple digestible UI that addresses their concerns and helps them cure any pain they may experience when an alert is triggered.

There may be users that need to dig into the data to investigate, but normal daily usage is focused more on alerts and trends. Seeing all of the data is not the main concern and the data is only available if they click a link to dig into it. The UI is kept clean, not overwhelming, and focused on the needs of the majority of customers.

This is fictional and if no one is doing it, I probably just gave away a new app idea. This is the reason that IoT is hot, its wide open for dreamers. The gist of the example is there is so much data to contend with on these types of projects you have to hide it and simplify the UI so that you aren’t overwhelming the user.

Consumer Apps Are Not the Only Game In Town

Additionally, you have to take into account whether the UI is for consumers or businesses. Making things pretty for consumers can help to differentiate an app in the consumer market, but the time to make things pretty for B2B or enterprise is better served on improving usability and the feature set. I am not saying that businesses don’t care about aesthetics, but unless they are reselling your UI to consumers it’s usually not the most important thing. For both audiences usability is very important, but usability doesn’t mean using the latest UI tricks, fancy graphics, fussing over fonts and colors just for the sake of having them. Everything should be strategically implemented to help usability.

This is especially true for large enterprises. I have seen many very successful and useful apps in the enterprise that were nothing more than a set of simple colored boxes with links to take certain actions and drill down into more information. So, you not only have to take into account the amount of data being managed in the UI, but the user doing the managing. Know your audience and build the UI to their needs. Leave out all the gradients and curves and UI tricks until you have a functional UI that serves the core needs of the business and leave polish for later iterations. Focus first on how to reduce the mountain of data into bite sized actionable chunks.

Conclusion

So, Keep It Simple Stupid! Hide the data, show alerts and trends, allow drill down into the data for investigation, provide a way to take action, leave polish for later iterations.

Event Sourcing: Stream Processign with Real-time Snapshots

I began writing this a long time ago after I viewed a talk on Event Sourcing by Greg Young. It was just a simple thought on maintaining the current snapshot of a projection of an event stream as events are generated. I eventually heard a talk called “Turning the database inside-out with Apache Samza” by Martin Kleppmann, http://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/. The talk was awesome, as I mentioned in a previous post. It provided structure, understanding and coherence to the thoughts I had.

It still took a while to finish this post after seeing the talk (because I have too much going on), but I would probably still be stuck on this post for a long time if I hadn’t heard the talk and looked further into stream processing.

Event Sourcing

Event sourcing is the storing of a stream of facts that have occurred in a system. Facts are immutable. Once a fact is stored it can’t be changed or removed. Facts are captured as events. An event is a representation of an action that occurred in the past. An event is usually an abstraction of some business intent and can have other properties or related data. To get the state for a point in time we have to process all of the previous events to build up the state to the point in time.

State Projections

In our system we want to store events that happen in the system. We will use these events to figure out the current state of the system. When we need to know the current state of the system we calculate a left fold of the previous facts we have stored from the beginning of time to the last fact stored. We iterate over each fact, starting with the first one, calculating the change in state at each iteration. This produces a projection of the current transient state of the system. Projections of current state are transient because they don’t last long. As new facts are stored new projections have to be produced to get the new current state.

Projection Snapshots

Snapshots are sometimes shown in examples of event sourcing. Snapshots are a type of memoization used to help optimize rebuilding state. If we have to rebuild state from a large stream of facts, it can be cumbersome and slow. This is a problem when you want your system to be fast and responsive. So we take snapshots of a projection taken at various points in the stream so that we can begin rebuilding state from a snapshot instead of having to replay the entire stream. So, a snapshot is a cache of a projection of state at some point in time.

Snapshots in traditional event sourcing examples have a problem because the snapshot is a version of state for some version of a projection. A projection is a representation of state based on current understanding. When the understanding changes there are problems.

Snapshot Issues

Let’s say we have an application and we understand the state as a Contact object containing a name and address property. Let’s also say we have a couple facts that alter state. One fact is a new contact was created and is captured by a “Create Contact” event containing “Name” and “Address” data that is the name and address of a contact. Another fact is a contact changed their address and is captured by a “Change Contact Address” event containing “Address” data with the new address for a contact.

When a new contact is added a “Create Contact” event is stored. When a contact’s address is changed a “Change Contact Address” event is stored. To project the current state of a contact that has a “Create Contact” and “Change Contact Address” event stored, we first create a new Contact object, then get the first event, “Create Contact”, from the event store and update the Contact object from the event data. Then we get the “Change Contact Address’ event and update the Contact object with the new address from the event.

That was a lot of words, but very simple concept. We created a projection of state in the form of a Contact object and changed the state of the projection from stored events. What happens when we change the structure of the projection? Instead of a Contact object with Name and Address, we now have a Contact object with Name, Address1, Address2, City, State, and Zip. We now have a new version of the projection and previous snapshots made with other versions of the projection are invalid. To get a valid projection for the current state with the new projection we have to recalculate from the beginning of time.

Sometimes we don’t want the current state. What if we want to see state at some other point in time instead of the head of our event stream. We could optimized rebuilding a new projection by using some clever mapping to transform an old snapshot version to the new version of the projection. If there are many versions, we would have to make sure all supported versions are accounted for.

CQRS

We could use a CQRS architecture with event sourcing. Commands write events to the event store and queries read state from a projection snapshot. Queries would target a specific version of a projection. The application would be as consistent as the time it takes to take a new snapshot from the previous snapshot which was only one event earlier (fast).

Real-time Snapshots

A snapshot is like a cache of state and you know how difficult it is to invalidate state. If we instead create a real-time snapshot as facts are produced, we always have the current snapshot for a version of the projection. To maintain backwards compatibility we can have real-time snapshots for various versions of projections that we want to maintain. When we have a new version of a projection we start rebuilding state from the beginning of time. When the rebuilding has caught up with current state we start real-time snapshots. So, there will be a period of time where new versions of projections aren’t ready for consumption as they are being built. With real-time snapshots we don’t have to worry about running funky code to invalidate or rebuild state, just read the snapshot for the version of the projection that we want. When we don’t want to support a version of a projection, just take the endpoint that points to it offline. When we have a new version that is ready for consumption we bring a new endpoint online. When we want to upgrade or downgrade we just point to the endpoint we want.

Storage may be a concern if we are storing every snapshot of state. We could have a strategy to purge older snapshots. Deleting a snapshot is not a bad thing. We can always rebuild a projection from the event store. As long as we keep the events stored we can always create new projections or rebuild projections.

Conclusion

Well, this was just me trying to clean out a backlog of old posts and finishing some thoughts I had on real-time state snapshots from an event stream. If you want to read or see a much better examination of this subject visit “Turning the database inside-out with Apache Samza” by Martin Kleppmann, http://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/. You can also check out implementations of the concepts with Apache Samza or something like it with Azure Steam Analytics.

Recent Talks with Big Effect on My Development

I have seen many, many development talks, actually an embarrassing amount. There have been many talks that have altered how I develop or think about development. Actually, since 2013, some talks have caused a major shift in how I think about full stack development, my mind has been in a major evolutionary cycle.

CQRS and Event Sourcing

My thought process about full stack development started down a new path when I attended Code on the Beach 2013. I watched a talk by Greg Young that included Event Sourcing, mind ignited.

Greg Young – Polyglot Data – Code on the Beach 2013

He actually followed this up at Code on the Beach 2014 with a talk on CQRS and Event Sourcing.


Reactive Functional Programming

Then I was introduced to React.js and I experience a fundamental paradigm shift in how I believed application UIs should be built and began exploring reactive functional programming. I watched a talk by Pete Hunt where he introduced the design decisions behind React, mind blown.

Pete Hunt – React, Rethinking Best Practices – JSConf Asia 2013


Stream Processing

Then my thoughts on how to manage data flow through my applications was significantly altered. I watched a talk by Martin Kleppmann that promoted subscribe/notify models and stream processing and I learned more about databases than any talk before it, mind reconfigured.

Martin Kleppmann – Turning the database inside-out with Apache Samza – Strange Loop 2014


Immutable Data Structures

Then my thoughts on immutable state was refined and I went in search of knowledge on immutable data structures and their uses. I watched a talk by Lee Bryon on immutable state, mind rebuilt.

Lee Bryon – Immutable Data in React – React.js Conf 2015


Conclusion

I have been doing professional application development since 2000. There was a lot burned into my mind in terms of development, but these talks where able to cut through the internal program of my mind to teach this old dog some new tricks. The funny thing is all of these talks are based on old concepts from computer science and our profession that I never had the opportunity of learning.

The point is, keep learning and don’t accept best practices as the only or best truth.

 

Testing GoCD on Azure Linux VM

I need a GoCD server to test some configuration and API work I need to accomplish. So I thought, “Maybe I can get a GoCD server running on Azure.” There are no examples of doing this with GoCD that I could find on the interwebs, but I didn’t do a deep search. Anyway, it can’t be that hard… right?

Create VM

On the Azure portal I setup an Ubuntu VM with the lowest size, A0. I am using the Resource Manager version of docs to walk me through this, https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-tutorial-portal-rm/.

I configured username/password instead of the secure SSH, because this is a throw away.

Connect to VM

I install PuTTY on my laptop. Actually, no install I just download the exe and run it (http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html). PuTTY will allow me to SSN into the VM to manage it.
Then I just connect to the IP of my new VM and enter my username and password and I have access to a command window to manage my new VM.

Package Management

I am a Windows groupie so my first thought was I wonder if I can use Chocolatey to install GoCD. Chocolatey NuGet is a Machine Package Manager, somewhat like apt-get (clue), but built with Windows in mind.
Nope, no Chocolatey on Ubuntu, but the apt-get clue above was right on target (https://help.ubuntu.com/community/AptGet/Howto). I can use apt-get to easily install applications.

Sudo

Before I begin, I want to learn about sudo. I see sudo in a lot of Linux commands so it must be important. Now is as good a time as any to get some context because I have virtually no Linux experience (no VM pun intended).
sudo (/ˈsuːduː/ or /ˈsuːdoʊ/) is a program for Unix-like computer operating systems that allows users to run programs with the security privileges of another user, by default the superuser.
sudo -s
This would allow the current session to run as root (superuser), not sure of the significance of this or if this is 100% correct or best practice. All of the resources I read seem to indicate this as a way to elevate the user in the session to fix certain permission issues while running commands.
So sudo is kind of like a better more friendly runas command.

Install GoCD Server

I use apt-get to install the server.
echo "deb http://dl.bintray.com/gocd/gocd-deb/ /" > /etc/apt/sources.list.d/gocd.list
wget --quiet -O - "https://bintray.com/user/downloadSubjectPublicKey?username=gocd" | sudo apt-key add -
apt-get update
apt-get install go-server

The first commands adds the GoCD desbian package repo to the apt package source list. Then a key is added for the new repo (not sure how this is used yet, but something must be encrypted). Next apt-get is ran and it goes through the source list and downloads missing packages. Lastly, apt-get install is ran to install the bits. This workflow is very familiar because its like Chocolatey. apt-get handles all dependencies so this is easy as can be.
I open PuTTY, connect to the server and run sudo -s to run as root (I probably should try to install without this step because I don’t know if it is necessary). Then I run the commands to install GoCD sever and it failed on the first command with “command not found.” I updated the command to use sudo sh -c, run the command as superuser, and it ran.
sudo sh -c 'echo "deb http://dl.bintray.com/gocd/gocd-deb/ /" > /etc/apt/sources.list.d/gocd.list'
I run the rest of the commands and the magic started happening… finally.
Then the magic fades away on the last command to install go-server, Unable to locate package go-server. After poking around I figured out how to open a file so I can inspect the .list file to see if it has the URL from the first command.
sudo vim /etc/apt/sources.list.d/gocd.list
This opens Vim, good thing I have been learning Vim. Anyway, the file has nothing in it. OK, the first command didn’t work for some reason. OK, we’ll fix that later. Right now I decide to manually add the URL to the deb repo, http://dl.bintray.com/gocd/gocd-deb/ /.
I naively try to ctrl-c, copy, the URL into Vim. Of course I had another mistake, because Vim is a different animal (I have to learn how to say sit in German or something). I couldn’t figure out how to copy from the clipboard to Vim (another day) so I just typed the repo URL. As with many manual tasks I mess up and entered 1 instead of l and when I run the command again I get 404 errors (d1.bintray.com instead of dl.bintray.com). After fixing this I was able to apt-get update and apt.get install… Yah!
Now, the GoCD dashboard is accessible at http://MyUbuntuVM.cloudapp.net:8153/go/pipelines.

Install GoCD Agent

Feeling accomplished I install a GoCD agent
apt-get install go-agent
Then, I edit the agent config to update the IP to the IP of my VM.
sudo vim /etc/default/go-agent
Then, I start the agent
/etc/init.d/go-agent start
Lastly, I check the dashboard to make sure the agent exits, http://MyUbuntuVM.cloudapp.net:8153/go/agents, and it does.
Mission Accomplished!

Conclusion

This was a lot easier than I expected, even with my goofs. Creating the VM was so simple I thought I missed something (nice job Azure team).
Now how do I automate it, can I use a Docker container to make this process even easier and faster, how do I move storage to a more durable persistent location, how do I install Git (apt-get maybe)?

GTP for BDD

Graphical Test Plan

I read a little about graphical test planning created by Hardeep Sharma and championed by David Bradley, both from Citrix. It’s a novel idea and sort of similar to the mind map test planning I have played around with. The difference is your not capturing features or various heuristics and test strategies in a mind map, you are mapping expected behavior only. Then you derive a test plan from the graphical understanding of the expected behavior of the system. I don’t know a lot about GTP, so this is a very watered down explanation. I won’t attempt to explain it, but you can read all about it:

Plan Business Driven Development with GTP

What interested me was the fact that I could abstract how we currently spec features into a GTP type model. I know the point of GTP is not to model features, but our specs model behavior and they happen to be captured in feature files. Its classic Behavior Driven Development (BDD) with Gherkin. We have a feature that defines some aspect of value that the system is expected to provide to users. In the feature we have various scenarios that describe the expected behaviors of the feature. Scenarios have steps that define pre-condition, action, and expectations (PAE) or in Gherkin, Given-When-Then (GWT) that define how a user would execute the scenario. We also have feature backgrounds which is a feature wide pre-condition that is shared by all scenarios in the feature.
I said we use Gherkin, but our new test runner transcends just GWT. We can define PAE in plain English without the GWT constraints, we can select the terms to describe PAE instead of being forced to use GWT which sometimes causes us to jump through hoops to force the GWT wording to sound correct. 

GTP Diagram

If we applied something like GTP we would model the scenarios, but there would be more hierarchy before we define the executable scenarios. We currently use tagging to group similar scenarios that exercise a specific subset of a feature’s scenarios. This allows us to provide faster feedback by running checks for just a subset instead of the entire feature when we are only concerned with changes to the subset. In a GTP’ish model the left most portion of the diagram would hold generalized behavior specs, similar to how we use tagging, and as we go to the right the behavior becomes more granular until we hit a demarcation point for executable scenarios that can then be expressed in a linked test case diagram (TCD). In the GTP there are ways to capture meta data like related requirement/ticket ID for traceability back to requirements. Also, meta for demarcation point (can’t think of better name) to link to the TCD or feature file that further defines it.

Test Case Diagram

The test case diagram would define various scenarios that define the behavior of the demarcation points in the GTP. The TCD diagram would also include background preconditions and the steps to execute the scenario. At this point it feels like this is an extra step. We have to write the TCD in a feature file so diagramming it is creating a redundant document that has to be maintained.
In the TCD there are shapes for behavior, preconditions, steps, and expectations. I think there should be additional shapes or meta to express tags because this is important in how we categorize and control running of scenarios. It may help if there is also meta to link back to the GTP that the TCD is derived from so we can flow back and forth between the diagrams. Meta in the TCD is important because it gives us the ability extract understanding outside of just the test plan and design. We could have shapes, meta descriptions and links to
  • execute automated checks
  • open a manual exploratory test tool
  • view current test state (pass/fail)
  • view historical data (how many times has this step failed, when was the last failure of this scenario…)
  • view flake analysis or score
  • view delivery pipeline related to an execution
  • view team members responsible for plan, develop, test and release
  • view related requirement or ticket
  • much more…

Since we also define manual tests by just tagging features or scenarios with a manual tag or creating exploratory test based feature files, we could do this for both automated checks and manual tests.

GTP-BDD Binding

To get rid of the TCD redundancy we could generate the feature file from the diagram or vice-versa. Being able to bind GTP to BDD would make GTP more valuable to me.
We would need an abstract object graph that could be used to generate both the diagram and the feature file (Excel spread sheet, HTML page or whatever else). We are almost here, we have a tool that can generate feature files from persisted objects and vice versa. We would just have to figure out how to generate the diagram and express it as an interactive UI and not just a static picture.
What we have been struggling with is the ability to manually edit feature files and keep that in sync with the persisted objects. With a centralized UI this is easy because everyone uses the UI to update the objects. When people are updating features files from a source code repository we have to worry about merge conflicts (yuck) and if we consider the feature file or the persisted object as the source of truth. So, we may have to reduce flexibility and force everyone to use the UI only. Everyone would have to have discipline and not touch the feature files even though we have nice tools built into our IDE to help write and manage them. The tool would have to detect when someone has violated the policy and so on…I digress.

Conclusion

With a graphical UI modeled on GTP/TCD to manage BDD we can provide an arguably simpler way to visualize tests and provide the ability to drill down to see different aspects of test plans and designs and their related current and historical execution. With 2-way binding from diagram to feature file we have a new way to manage our executable specifications. This model could provide a powerful tool to not only aide test planning, but test management as a whole. The end result would hopefully be a better understanding for the team, increased flow in delivery pipeline, enhanced feedback, and more value to the customer and the business.
Now lets ask Google if something like this already exists so I don’t have to add it to my ever increasing backlog of things I want to build. Thanks to Hardeep Sharma, David Bradley, and Citrix for sharing GTP.