Reading time: ~3 minutes
It’s the end of the week and a lot of you are probably getting ready to stamp ‘Done’ on some important work.
Will you remember why you and your team made important decisions next week? A month from now? A year?
Architecture Decision Records are a lightweight method for building long-lived, institutional memory.
Once you have reached a decision on an important or difficult-to-reverse matter, you record the context within which you made that decision. Important decisions include things like choice of language, database technology or storage strategy, or application framework. Once a decision record is ‘accepted’, it doesn’t change. If new information surfaces that causes you to reverse the decision, record it in a new decision record. In my experience, this usually takes 1-2 hours per record that I treat as a checklist item at the end of a sprint (Do any decisions need recording?).
Each decision record contains:
A descriptive title that starts with a monotonically, increasing number identifying each decision
003 – Select a Container Orchestrator
A status indicating whether this decision is proposed, accepted, rejected, etc; you can align this to your organization’s terminology for reviewing or adopt terms from the ADR or RFC communities
The context describes the issue you are trying to address and what is going on in your organization at the time. Importantly, the context includes the forces and constraints that were at play while investigating the issue and solutions.
This is where you’re going to look in 6 months (or years) when you are trying to understand why the heck you did or didn’t do something.
If you were selecting a container orchestrator, you might capture information like:
- We expect to have minimal operations staff to devote to running container clusters
- We think we’ll want to have many container clusters, probably one for each department so that it’s easier to delegate authority and allocate spending
- We’re using AWS and there is a strategic preference for managed service offerings, so Elastic Container Service and Elastic Kubernetes Service get automatic preference over running your own.
- Some of the clusters may run workloads with a HIPAA compliance requirement, so the solution must support that.
If this context looks like a subset of what you’d normally discover and document in a design process you’re right. Copy-paste and distill what you’ve captured there. The ADR gives you the opportunity to highlight what was most important in making the decision.
Describe the decision you made and why.
We chose to use AWS ECS running on Fargate for our containerized applications.
Then you can describe how ECS was the best solution within the context that was previously described. Continuing from above…
ECS on Fargate means we won’t have to manage our own compute cluster and we still have a HIPAA compliant platform.
Also take time to describe why the other solutions weren’t the best option. For example:
We were interested in EKS. However, we were constrained by the lack of HIPAA compliance for EKS on Fargate. If this had been available, we probably would have selected it.
As you can imagine, this kind of narrative is super-useful for (re-)learning why a team did what they did.
The expected consequences, both positive and negative for all the decisions you evaluated. Every decision has tradeoffs and here’s where you enumerate the most important ones.
ECS on Fargate:
- we expect clusters to be operable by each department with a permanent allocation of less than 1.5 people; container cluster security, updates, scaling, and more are managed by AWS
- container cluster is HIPAA compliant
- we give control over several important aspects of security including isolation and container host updates to AWS
- costs are higher than a fancy autoscaling cluster using spots and on par with on-demand EC2
EKS on EC2:
- we expect clusters to be operable by each department with a permanent allocation of 3 people
- we have to manage container cluster security, updates, scaling, etc
- we retain control over many aspects of container cluster security, particularly for the worker nodes
- may be able to take advantage of spots for worker nodes
Illustrate the tradeoffs through example and tie back to the context that drive the expected consequences. These consequences can highlight the speculative nature of the decision with words like ‘expect’ and ‘may.’
That’s a wrap
We make every decision with imperfect information. ADRs give you a way to record what you knew at the time and why you did what you did. Revisiting this history can help you avoid hindsight bias when reexamining decisions down the road and when you consider changing course.
Cruise through the ADR variations and examples in Joel Parker Henderson’s ADR repository.
Receive #NoDrama articles in your inbox whenever they are published. Reply to Stephen and the QualiMente team when you want to dig deeper into a topic.