Testing Games in the Wild

Ins and outs of turn-based games

Recently I had coffee with an indie board game developer who was nearing the release of her very own monster featuring board game. The primary idea in my mind was unfortunately not about how cool the game was, which it definitely was, but rather “how in the world she get the game dynamics to balance?.”  I guess I was also hoping for a different answer than “play test and improve.”

 

At one of the companies that I work at [1] I lead a small team that develops multi-tablet games for corporates and other medium to large institutions. These games create simulated versions of the organisation that help the employees develop key skills and attitudes to improve the organisation. This is a complex process which moves through organisational learning theory and harvesting knowledge about the working of the organisation through to representing this knowledge in a game. On top of all of that the game still needs to achieve the outcomes required by the organisation.

 

One of the key elements is to create an abstract (and simplified) model of the organisation. This abstract model is then realised in a tool we have developed that allows us to build turn-based games – basically a specialised interpreter with it’s own domain-specific language [7]. Our proprietary tool is great at building these turn-based games, but since the games revolve as much around the real life discussions the employees have as the actual game play there had not been a lot of focus on the testing aspects required for such a platform.

 

Testing in the wild

I also work at Polymorph Systems [2] – a development house that specialises in helping clients realise their mobile dreams, everything from banking apps through 4×4 mapping apps to retro space shooters (and of course the tablet apps on which the corporate games run). In this world we have unit testing frameworks, continuous integration platforms and tools focused on the standard development practices of software engineers. We nearly always have strong control over the environments we use, as well as their capabilities, and what capabilities the environments don’t have we can normally extend them with.

 

At times it has therefore been frustrating to not be able to extend and modify the tool we build these corporate games with. Don’t get me wrong, the tool is very good at what it does – developing custom turn-based games – it just wasn’t focused on testability.

 

In simple software parlance the tool’s developing capabilities are strong but the testing capabilities are not. For example the tool does not (currently) allow you to check whether a set of inputs will give you a particular set of results, something typically available in unit testing frameworks such as the venerable JUnit.

 

In a perfect world we would have time and money available in order to build a whole testing framework around the tool. This would allow us to build in custom checks at any point we wish. Unfortunately this is real life and real business with definite resource constraints.

 

So what strategy did we follow?

In a strong sense we have stuck with the paradigm that my indie board game colleague uses – we “play test”. However we have added features to the tool that allow us to play test in a more repeatable fashion.

 

Firstly we have split the games into logical phases which we can jump back to and replay from. In a typical game there will be a strategic, tactical and operational phase, so once you have played through the strategic phase we can jump back to its beginning and replay from that point onwards.

 

Secondly we’ve add a feature by which you can replay a particular set of choices in the game. So if you start from the beginning of the operational phase you can replay all the operational choices you made during a previous play test.

 

Thirdly we’ve added a feature to export the data from the game. This has less to do with testing the game than debugging issues in the game. The great thing about this export feature is that we can analyse the data in tools more appropriate to visualising data, e.g. numpy (python), octave, spreadsheets, etc.

 

Lastly, we’ve added local caching on the tablets. I guess this needs some explaining… The games run on a bus network architecture, however it is logically a client-server environment. If you start a game from midway through the server first needs to ensure that the clients are all in the right state. This is the safe and correct way to do things, however it takes time and delays testing. So in order to alleviate this time wastage the clients (mostly iPads) have a configuration settings that allows them to cache data, hence saving a lot of setting up time during testing.

 

The end game

According to Wikipedia [3] two of the key metrics for testability is effort and effectiveness. In the real world this has to be balanced out with the available features of the environment you work in and other resources at your disposal. In our case we have focused on the following:

  • Testing capabilities that follow the actual game flow, therefore minimising the effort of doing testing in parallel to development
  • Testing capabilities that allow us to granularly replay tests from known points in time, increasing the effectiveness of doing re-tests
  • Capabilities that speed up the process of getting to the initial test state, decreasing the effort required to do testing

 

In summary all of these capabilities improve the repeatability of our play testing and in this way decreases testing effort and increases testing effectiveness.

 

If you’re new to the world of testing and improving testability I would really recommend looking at Test-driven development [4][6] and Design for testability [5] as a good introduction to this field.

 

Have fun out there in the wild!

 

[1] http://aim.org.za

[2] http://polymorph.co.za

[3] Wikipedia – Software Testability

[4] Wikipedia – Test-driven development

[5] Design for Testability

[6] testdriven.com

[7] Wikipedia – Domain-specific language