On BDD From Acceptance Tests to Story Level Specifications

Behavior Driven Development is best described as taking an “Outside-In” approach, meaning you define your feature from the outset and describe business value, and work your way in a small piece at a time. That being the case, I thought it’d be best to cover one stage of evolution that I’ve had over the years… from Developer Facing Acceptance Tests to Customer Facing Story Level Specifications.

The Start: Developer Facing Acceptance Tests

These were the starting point… where I started 5 years ago. I would write rather long-winded integration-style tests in a xUnit framework that would exercise the end-to-end functionality of a feature. It worked because it provided an automated test that would verify that the feature actually worked, and provided a nice safety net when doing wide-ranging system re-structuring.

These also would at times consist of “data-driven testing,” that is, it’d exercise the system with different input data to verify different outputs and behavior. There would often be quite a bit of setup and tear down of inserting/removing data. Setting up test records would often create a lot of noise.

The problem here is that at the time the stakeholders were not involved in the process other than to hand off requirements and be around for the devs to “hand off” completed work to QA. The “acceptance tests” could barely even be called that… as far as anyone was concerned, they were for “developer eyes only” and usually consisted of very very large test methods that, to be honest to God were very painful to read and quite brittle. Truth be told, this didn’t work well and even the smallest of these so-called acceptance tests barely conveyed any domain knowledge whatsoever. Fail.

alt_text

A Little Better: Developers Writing Acceptance Tests In FIT

From this point, I arrived at an organization that had started using fitnesse for acceptance tests. This was quite a bit better than writing mammoth JUnit Tests with lots of test data setup as one could now just create a test with a great big table of input data, run through the full system, and verify the outputs. The nice thing here is one could write acceptance tests for each feature and re-use a few ColumnFixtures and RowFixtures to set up and verify the system.

The problem however is that when acceptance testing is approached from a developer point of view, again much domain knowledge is lost through clever re-use and treating the tests written in fitnesse as “test scripts.” It wasn’t uncommon to see pages of fitnesse tests filled to the brim with xpath expressions or the like. I felt a little disheartened… reading the FIT book I got the impression it was a tool to aid stakeholder and developer communication, but even the QA was mortified at the sight of the fitnesse tests the developers wrote.

To help ease things a bit, we tried to take a “QA matrix” of a huge spreadsheet of real records found in the database and whip up a fixture to parse and execute it. We had an “executable QA Matrix” but there was still a lot to be desired… looking back at one of those fitnesse tests a year later would definitely leave one scratching their head in confusion as to what domain logic was being verified.

I recall at one time attending a presentation by a team that had bragged about full customer involvement that had their primary stakeholder actually writing fitnesse tests for them and was quite intrigued… this was EXACTLY the kind of customer/developer interaction I had been seeking, although there was a disclaimer that their stakeholder had some technical knowledge. When I looked at the fitnesse tests, they were filled with JDBCFixtures running raw SQL queries and using xpath queries in HtmlFixture to verify what should be displayed in the UI. Far from the nirvana I was seeking.

Now let me say that FIT isn’t a bad tool… in fact, I’d say it’s a rather good Story Level Specification BDD framework! We were just using it wrong, and I think 70% of FIT users misuse it too. At this stage I think we managed to accomplish all five ways to misuse FIT that James Shore once blogged about. He put it exactly the way it was,

If you aren’t using Fit for communication and collaboration, you’re just writing automated tests. In HTML! With tables!

We would desperately try to make our tests more readable… DoFixture worked quite well in this regard by allowing us to map actual sentences to fixtures to perform work and left us with tests that were very readable and easy to understand, at times we were even able to pull them up during a meeting and show them to our stakeholders and they would correct us on some of the logic or agree that it accurately describes how the feature should work.

We were close, but a lot was left to be desired. For the most part, these were still automated test that, although readable, were still developer-centric and barely decipherable by the stakeholders. Some attempts might sometimes be made to make it so that an actual requirements document can be copied in (text, tables, and all) and made executable, but I think this really tries to cover too much at once, misses a lot, and encourages stakeholders to continue churning out requirement documents to developers instead of engaging in hands-on collaboration in user story writing sessions.

alt_text

What We Were Learning

Although what I have described thus far isn’t remotely close to an ideal working environment, these experiences had taught us several key points:

Collaboration is IMPORTANT! We need to keep the stakeholders in the loop as much as possible
Plain text works well in expressing features
We should treat our pages in fitness as an executable wiki, a kind of living documentation
Acceptance Tests in FIT typically involve setting up some kind of environment for the scenario, execution of the System Under Test, and verification of the behavior
They should do whatever they can to describe the domain

At this time period, we had finally gotten our stakeholders to the point where some were quite comfortable with writing user stories and would often copy the user story into a new Fitnesse page, take conditions of satisfaction off of the card and transform each of them into scenarios. They were usually bullet points describing some simple rule (something like “If it’s older than 10 days, don’t display it”) and would usually be expressed pretty nicely in the fitnesse test.

With this environment in place, it was quite a bit easier to use our acceptance tests to communicate better and clearer with our stakeholders… sometimes they would even dive in and write a few scenarios here and there for us. What struck us here is that the most effective approach is to have user stories that describe a feature well and provides some very concrete conditions of how that feature should behave.

Bridging the Gap: Behavior-Driven Development

Around this time is when I discovered Dan North’s post What’s In A Story, which outlined two notable concepts: Feature Injection and scenarios expressed as Given, When, Then. I’ll touch more on Feature Injection later, but what I really want to focus on here is the scenario format. Specifically, scenarios fall into the format:

Given (some precondition or context)


When (an event occurs)


Then (outcome)

Having customers start writing their conditions of satisfaction in this format made them much more concise… and often we’d find that we’d build on top of them after thinking about each one. In BDD, we call these Scenarios, but we also call them Application Examples. Which is what they are… the customer is describing an example of how the application should behave in different instances for this feature.

So… how does this fit into acceptance tests? Well, the moral is you have to stop thinking of them as “tests” … imho such thinking is what helps teams wind up with some monster test that generates data for the 500 possible scenarios, execute for two hours, then verifies the results. We’re not testing… we’re describing the examples of how the system behaves for that particular feature in an executable way… reaching green isn’t so much of passing a test but rather an indication of feature completeness.

How Do You Do It?

Any framework can meet the needs of working with your customer to specify application examples at the story level… it really just depends on what tool you and your customer feel comfortable with. Remember it’s not so much the framework, but rather the philosophy… you and your customer are not writing tests, but application examples. I just really want to make sure I drill that point home.

alt_text

That being said, there are quite a few. Fitnesse/Slim is pretty good… especially when coupled with the plain text FitLibrary styles and GivWenZen fixture. I’ve also tried concordion out, which is nice if you have your customer use some html editor (or if they are content with html). Cucumber and JBehave are equally nice and I like them because they use just plain text… you can even use a tool like Pickler to sync stories from Pivotal Tracker to execute in cucumber. There’s quite a bit more, the moral is to do your own exploring and stick to whatever tool you and your stakeholders like the most.

The next question I often hear is at what level is appropriate for writing story-level specifications? Should they run through the GUI? Although some BDD proponents do believe you should run them through the GUI since that is the topmost level, I disagree. From experience, I find that examples ran through the GUI lose a lot of domain terms and they are quickly replaced with terms like “click, enter text, sees on page” … sure, it does describe the application’s visible behavior, but in my opinion, it causes examples to miss the point of how the domain works. Additionally, I find that the GUI is often quite volatile and slow, which may work as an obstacle to writing as many application examples as are needed.

I usually wire my application examples through the underlying mechanism… either through the controller type object that might be in place or even through the direct services that the controller might use… essentially, the fixture becomes a stand-in for the controller or point of entry the application might normally have. I might also substitute external dependencies as well to provide speed, but again this is entirely up to you.

I like to start with an Application Example or two in place, then work at getting each one to pass, writing unit-level specifications for each object I need, and using test doubles for each component I haven’t built yet, iteratively replacing each stand-in for the real thing. I’d like to hear the approaches others take, so let me know.

Scratching the Surface

I’ve only scratched the surface with this post… and a lot of the topics I briefly touched on can be delved into at a much deeper level. Topics like Feature Injection and how we can use it and Application Examples to help us out in analysis and gain a deeper understanding of the domain process. I’ll try to dig deeper in future posts, but I think Chris Matts‘ comic on the subject covers it quite well.