Configuration is data interpreted by a program to change its behavior so that it supports a particular use case.
Problems Configuration Solves
Configuration solves problems for both program developers and operators.
Most program developers do not want to change the program’s source code, rebuild, and repackage the program each time they need to vary the behavior.
There is a ‘temporal binding’ problem where we’d like to delay commitment to a configuration’s value until later. Examples include:
- the location of the database used by the program
- the maximum number of database connections to use and query timeout
- whether new feature X is enabled
And very few people want to produce, deploy, and manage lots of opaque program packages with ‘hard-coded’ configurations. You could easily end up with thousands of unique packages over time or even simultaneously depending on the number of people using the program. I’d say no-one wants to do this, but there’s weird and wild stuff out there.
This is a management problem that grows with the number of distinct deployment environments and changes.
So to solve these problems, we program the application to read configuration data on startup and adjust runtime behavior accordingly.
But there’s tension here that I can illustrate with a few questions:
- What do we hard-code?
- What decisions do we make available as an external configuration?
- How do we model configuration data so that people can express their choices safely (even at 2am)?
A great introductory discussion to answering these questions is ‘The Configuration Complexity Clock’ by Mike Hadlow, which suggests there are four basic strategies for solving configuration problems:
- Config values
- Rules engine
- Domain specific language (DSL)
Over time, I think you’ll agree with Mike (and I), that these are a continuum of strategies for representing and communicating desired changes in program behavior. Mike’s discussion of the ergonomics and efficiency of expressing configuration is important, especially in the context of managing a bunch of ‘small’ changes necessary to vary program behavior for a few deployment environments or tens of different customer deployments.
But what about the condition I tacked on to modeling configuration:
so that people can express their choices safely (even at 2am)
Configuration gives you the capability to delay selection of program behavior until after the program has been built, but you are not relieved of verifying the correctness and safety of your program.
You can build confidence that your program is correct and safe under different configurations by:
Developing and reviewing code to process configuration data just as carefully as any other program or API input, sometimes moreso. Unlimited flexibility is unsafe for many configuration parameters.
Assert that necessary configurations are provided and fall into expected the expected set or range of values within the program.
Exercise the most-common portions of the configurable state space with tests. These could be:
- unit tests that verify system safeguards in known, common configurations and unknown configurations using a generative testing framework
- integration tests of a partially deployed system again with known and generated configurations
- functional tests of a fully-deployed system with programs in different combinations and entropy introduced by Failure Testing aka Chaos Engineering
Formal verification methods are appropriate when the stakes are very high.
Wait! We’ve missed something. We haven’t gathered feedback from the users of our configuration scheme if they understand the interface we’ve built for them. Will they be able to express their desired intent at 2AM when they are trying to fix a problem?
I recommend observing what your users actually do to reconfigure your program to achieve certain outcomes versus asking them about a hypothetical (where they’ll be nice to you). Here’s a couple ways to generate that feedback:
- go through version control history and interview people who made configuration changes
- use a ‘Chaos experiment’ to introduce specific kinds of failures that are likely to be resolved with configuration changes to so you can observe and practice incident response (or see what happens when someone dials the DB connections up to 11!)
Now you have more data you can use to improve your configuration’s usability and safety.
I hope this introduction to the concept of Configuration has given you something to think about, especially from a safety and usability perspective. I’d love to learn your thoughts or favorite resources on this topic. Ping me anytime.
Receive #NoDrama articles in your inbox whenever they are published. Reply to Stephen and the QualiMente team when you want to dig deeper into a topic.