Operating applications involving more than a few components without explicitly modeled contextual clues is difficult for humans and might be impossible for tools. This is especially true when people are a couple steps removed such as is often the case when analyzing costs and/or assessing risk.
Both people and tools rely on:
- identifying resources at both fine and coarse granularities so you that you can identify them uniquely as well as one of a group of similar resources, e.g. compute instance X is a member of cluster Y
- scoping resources to a particular management, fault, or security domain, e.g. an Environment
- describing important attributes of the resource’s responsibilities, activities, capabilities, and lifecycle
The entire team or even organization can analyze the resources that are deployed and answer their own questions when compute resources are identified, scoped, and described well. This is a much more efficient and scalable alternative to pinging the platform or security team to answer those questions.
Tagging schemes are the primary way Cloud deployments model this context.
Cloud providers and many tools that work with them have proposed tagging standards that will help you add context to manage your Cloud resources more cost effectively, with better operational performance, and a bit more securely.
Tagging Standard Analysis
I collected and analyzed the tagging standards suggested by Amazon Web Services, Azure, GCP, Apptio (Cloudability), and Datadog in a spreadsheet to identify the superset of tagging best practices. The spreadsheet links to the standards and suggestions.
The following tags are found in almost all the standards; here they are in order of popularity:
- Owner: Identify who is responsible for the resource
- Business Unit: Identify the top-level division of the organization that owns the account/subscription/project or that the resource belongs to.
- Cost Center: Identify the cost center or business unit associated with a resource; typically for cost allocation and tracking
- Application: Identify resources that are related to a specific application
- Environment: Describe which stage of delivery the resource belongs to and distinguish between development, test, and production resources
- Role: Describe the function of a particular resource within an application, e.g. web server, message broker, database
The current tagging schemes are best at:
- Identifying Application resources and their Owners, primarily for cost accounting and analysis
- Scoping the resources involved in operating an Application or Environment, primarily so that people can solve performance and availability problems
These tags provide the minimum solid contextual foundation for Cloud deployments.
Where the minimum breaks down
If you want to achieve operational excellence and high security, you will benefit from describing those requirements as well. Describing application availability, confidentiality, and other ‘-ilities’ in your resource tags makes them available for easy analysis.
Interestingly, I did not find much guidance for tagging to support operational excellence and security, though the AWS and Azure standards come closest with:
- Confidentiality: (AWS) An identifier for the specific data-confidentiality level a resource supports
- Compliance: (AWS) An identifier for workloads designed to adhere to specific compliance requirements, e.g. HIPAA
- Service class: (Azure) Service level agreement level of the application, workload, or service. e.g. Dev, Bronze, Silver, Gold
I have helped people adopt a
DataClassification tag to classify resources according to the organization’s several-tier data classification standard. Those data classification standards were SANS-style and had options like: Public, Internal, Confidential, Restricted. I think that is a reasonable approach from a security perspective, but you may get more value out of the effort by decomposing a Security expert’s
DataClassification judgement into Confidentiality, Availability, and Integrity requirements that are easier for a non-Security expert to describe. Those components should be more reusable, too.
(Updated) Next, we’ll explore a model for specifying those Cloud deployment Security requirements along with likely benefits and challenges. This leads to an approach on modeling risk for Cloud deployments. I’d love to hear about your tagging scheme and what has or hasn’t worked for you.
Receive #NoDrama articles in your inbox whenever they are published. Reply to Stephen and the QualiMente team when you want to dig deeper into a topic.