Reading Time: 7 minutes
Last week I described what application containers are and the problems they solve. Today, I’d like to describe some practices for running Docker application containers more securely.
We’ll use the cute-and-fun, but maybe not so secure rando-doggos application as an example. rando-doggos is a simple Python Flask webapp that displays images from some of my favorite dogs on the WeRateDogs twitter feed. If you have Docker installed, you can start an instance of the webapp listening on port 5000 with:
docker container run --rm -it \ -p 5000:5000 \ qualimente/rando-doggos:2018-03-20-1030
If you point a web browser to http://localhost:5000, you should see a cute dog. (Replace localhost with the name of your Docker host, if it’s elsewhere)
You can issue a couple of
^C (control-C) keystrokes into the terminal running docker to stop the webapp.
Docker’s default container implementation adds a number of security features to every application over what you would get by running that application ‘normally’ on the host without a container. The primary containment mechanisms active by default are:
- the dedicated filesystem, process, and network namespaces that are separate from the hosts’; described in the previous post
- fewer Linux Capabilities, even when running as ‘root’ (uid=0) in the container
Secure the Application Image
Start by reducing the application image’s attack surface. These activities are similar to what you would do when securing an application on any host:
- Update the application’s libraries
- Update the application image’s base system
- Remove unneeded tools from the image
One big difference between containers and virtual or physical machines is that containers should generally not be ‘patched’ or updated at runtime. If software inside an application container image needs to be updated, the image should be rebuilt and containers started from that new image. Rebuilding images and redeploying containers through the normal delivery pipeline will ensure the application still passes all of its usual quality checks.
Once your application image is ready to run, you can run it using the ‘basic’
docker command we started with.
docker run more securely
You can also configure the application’s Docker container to be much more restrictive. Let’s build up a quite-secure
docker container run command step by step.
Step 1 – Run as an unprivileged user
Running containerized application as an unprivileged user is the first step towards more secure operations and is fundamental to implementing Principle of Least Privilege. Using unprivileged application users gives attackers fewer privileges to start with if they have compromised an application.
By default, Docker containers start with the user specified by the image’s
USER instruction. When the user is unspecified it defaults to root (uid=0).
Now this isn’t as bad as it might sound, but it’s still not great. By default, reduces the Linux Capabilities the container user has to those in this list of capabilities. This means that root in a container is less powerful than uncontained root ‘on the host’. In particular, root in a container will not have
Docker makes it easy to run applications as a particular user or user id with the
--user command line option.
Let’s start the rando-doggos application container as an unprivileged user, uid=1500 and gid=1500:
docker container run --name rando-doggos --rm -it \ -p 5000:5000 \ --user 1500:1500 \ qualimente/rando-doggos:2018-03-20-1030
If you run this container and then exec into it and run ps, you’ll see
python is running as uid
$ docker container exec -it rando-doggos ps -ef PID USER TIME COMMAND 1 1500 0:00 python /usr/src/app/app.py 7 1500 0:00 ps -ef
The userid and groupid of 1500 was picked (mostly) arbitrarily. This userid and groupid did not exist on my machine prior to running the container. In this case, Docker will create the specified userid and groupid before creating the container. This is pretty convenient as it means automation doesn’t need to know or prepare container hosts to run particular services.
Note that normal Linux filesystem permissions are still enforced. The application must be runnable with the specified uid+gid. Application libraries and configuration files must also be readable. Many people solve this by adding application files to the image as readable by everyone.
The neat thing about this approach is that:
- you aren’t relying on the image to specify a user
- you can pick an unprivileged uid+gid that mean something to you, i.e. you assign canonical numbers to each service similar to
- if you pick a uid+gid combination that doesn’t exist in the base image, the runtime user will only have access to files that everyone has access to
Now let’s look at an easy method to help prevent an application or attacker from escalating their privileges.
Step 2 – No new privileges
A relatively recent and useful addition to the Docker security toolset is the
no-new-privileges option. The
no-new-privileges security option prevents the application processes inside the container from gaining new privileges during execution.
So even if:
- there is a program with the setuid/setgid bit set in the image, such as
- a process in the container has (file) permissions to execute the program
Any operation that tries to gain privileges through facilities such as setuid/setgid bit will be denied.
So if the webapp was run in a container like:
docker container run --name rando-doggos --rm -it \ -p 5000:5000 \ --user 1500:1500 \ --security-opt no-new-privileges \ qualimente/rando-doggos:2018-03-20-1030
and our application was tricked into running
sudo su root (further suppose the root user’s password is unset in the container so the attacker doesn’t need to know the password), then
sudo would be denied at the point where it requests additional privileges to run
Now let’s examine how containers can prevent unexpected resource consumption.
Step 3 – Limit Compute Resources: Memory and CPU
Applications may ‘runaway’ and consume large amounts of compute resources when they are attacked or suffering from a bug. Application containers can help you keep these problems under control by limiting the resources an application can consume, preserving resources for other applications and host system operations.
Every Docker container gets its own Linux Control Group (cgroup) by default. Cgroups are a Linux kernel feature that can:
- account for cpu, memory, I/O and other resources used within a container
- enforce limits for the use of those resources, e.g. denying further memory allocation or throttling cpu usage
With Docker it is straightforward to set these limits with
docker container run options such as
--cpus. Here’s how we could limit the rando-doggos app to 128MiB of memory and three quarters of a cpu:
docker container run --name rando-doggos --rm -it \ -p 5000:5000 \ --user 1500:1500 \ --security-opt no-new-privileges \ --memory 128m \ --cpus 0.75 \ qualimente/rando-doggos:2018-03-20-1030
If the web application tries to allocate more than 128MiB of that request will be denied. If that denied memory allocation or some other problem triggers an application crash, Docker will restart the application according to a user-configurable restart policy. The default restart policy is
no, but other options including
on-failure with a configurable number of retries are available.
Similarly, if the web application becomes really popular (cute dogs!), the Linux kernel will throttle the cpu it uses so that it only uses its fair share.
Update: See these follow-up posts for details on limiting memory and cpu resources:
Step 4 – Read only filesystem plus small tmpfs
The final control we’ll add today is one to prevent unexpected writes to the container’s filesystem. Each container starts with a fresh copy of the application image’s filesystem. By default, that filesystem is writable. However, if the application does not need to write to the filesystem, or we know the specific places it needs to write, then we can permit just that.
The following container run command starts a container with a read-only root filesystem and also provides a small (64kb) in-memory tmpfs for temporary files at
docker container run --name rando-doggos --rm -it \ -p 5000:5000 \ --user 1500:1500 \ --security-opt no-new-privileges \ --memory 128m \ --cpus 0.75 \ --read-only \ --tmpfs /tmp:rw,noexec,nosuid,size=64k \ qualimente/rando-doggos:2018-03-20-1030
If you exec into the container and try to write a file someplace other than /tmp (e.g. the working directory), that write will fail:
$ docker container exec -it rando-doggos sh -c "echo 'hello' > afile" sh: can't create afile: Read-only file system
whereas if you write to
/tmp it will succeed:
$ docker container exec -it rando-doggos sh -c "echo 'hello' > /tmp/afile; cat /tmp/afile" hello
You may have noticed that the tmpfs specifies
nosuid options. This means that programs cannot be executed from that filesystem. So an attacker would have limited space and options within this container to bring their own tools or exfiltrate data.
This post has introduced some of the additional security controls you can adopt easily when running applications in Docker containers. The four steps here can help you:
- Implement Principle of Least Privilege
- Prevent privilege escalation
- Mitigate denial of service attacks to co-located applications or the host
- Limit what an attacker can do within a compromised container
Docker has many more security options you can employ to limit what system calls a container may execute, what networks they connect to, and more. Please reach out to me if you have any questions, comments, or assistance implementing container security controls.
p.s. I’m at the AWS re:Inforce Security conference Tues and Wed. I’d love to meet you in person if you’re there.
Receive #NoDrama articles in your inbox whenever they are published. Reply to Stephen and the QualiMente team when you want to dig deeper into a topic.