Docker has been around for a while now. Companies use it in production, people write plenty of articles, record video tutorials, yet most developers don't "trust" it to handle their development environment. Let alone use it in production.
And that's usually because understanding Docker is hard. This article tries to be a primer that will save you a few hours of screaming and paranoia without leaving out important details.
Docker allows you to:
- Wipe differences across environments. That is, your production, staging and development environments will be (should be) identical(ish). The same applies to the development environments of different developers. In other words: no more "works on my machine".
- Ensure that the exact same environment can be deployed wherever Docker is installed, no matter the hardware or operating system. Good old "build once, run everywhere".
- Build a reproducible environment with a very small footprint (usually just your source code and a special file called
Dockerfile) that you can share with others. No more copying 20+ GB virtual machines to onboard new developers.
- Declare your project's dependencies as a handy YAML file that can be used to setup everything you need at once (e.g. your app server, database, queues, workers etc.).
- Create a scalable, reliable cluster that can run your entire project across a fleet of heterogeneous physical machines with a few lines of bash (Docker Swarm).
If you are not impressed yet keep reading.
Why not Docker?
Vagrant is better than Docker
– someone who has no idea what Docker is
If you have used virtual machines to setup your development environment you probably used Vagrant to handle them. Vagrant is a front-end to some famous virtualization back-ends such as VMWare or Virtualbox.
Comparing Vagrant with Docker is like comparing apples and beer cans. Irrelevant.
Docker is definitely not the ideal option if your intent is to use containers the same way you would use virtual machines. In that case, Docker will just make your life harder.
This is not to say that Docker cannot do that. It can, it's just not its primary purpose (nor its secondary for that matter).
For example: when using a virtual machine the flow is quite familiar. You turn on the machine, install some packages, configure something at the OS level, maybe write a few files here and there and, once you are done, suspend or shut down the VM just to restart from the exact same point when you have to work again. Now: all your modifications are saved within the VM's filesystem/memory. With Docker the user is supposed to stop and start different containers every time, therefore losing the changes local to an individual container. The reason to use different containers every time is to ensure that the provisioning process is easily reproducible and they can be deployed and work anywhere, at anytime, without making assumptions on what the underlying state is. Docker is built to deploy applications, not machines.
kill doesn't remember the PID of the last process you sent a signal to, does it?
Another huge difference between using a VM and Docker for development is that with Docker you seldom install all services (e.g. database, web server, key-value store) in the same container but rather split those into separate containers that communicate using a virtual network created by Docker itself. This is exactly what happens in the real world: usually all the services needed to run a project (unless it's a very small one) run on separate nodes.
The Docker Engine
Before diving into more details it's important to understand that with "Docker" we could be talking about: the company, the project, the tool. Throughout this article we will refer to either the project or the tool, letting the context drive your intuition.
Docker uses a client/server architecture. The server uses the Docker engine to cache images, run and manage containers, handle logs and much more.
Images, containers...What are you talking about?
Right right. The perfect analogy (for programmers at least) to understand how Docker works is the following:
Dockerfileis the source code.
imageis the binary resulting from compiling and building your
containeris the running instance of an image, a process by all means.
- Images are cached and containers are run by a Docker host (a machine running the Docker engine).
- Images are stored in a
registry, think of it as a repository for package managers such as
npm. Docker offers a public registry that anyone can use to store images, as long as these are kept public (one private image is allowed free of charge).
To install Docker on your operating system simply head to https://docker.com and pick the right version.
Let's get started!
Creating a Dockerfile
Dockerfiles use a simple syntax to express the steps that should be taken to build a specific image. A dead simple Dockerfile is the following:
FROM ubuntu RUN echo "My first Docker image"
Breaking it down
FROM ubuntutells Docker to use the latest
ubuntuimage as a base. The image will be retrieved from the public registry.
RUN echo "My first Docker image"tells Docker to run the command
echoinside the container.
Building an image
You can now build an image from this Dockerfile with:
docker build .
Breaking it down:
buildtells Docker to build an image.
.tells Docker to look in the current directory for the
Dockerfileand to use the current directory as a "context" so that we can reference files and directories from there.
You will see an output similar to this:
Sending build context to Docker daemon 2.048 kB Step 1 : FROM ubuntu latest: Pulling from library/ubuntu b3e1c725a85f: Pull complete 4daad8bdde31: Pull complete 63fe8c0068a8: Pull complete 4a70713c436f: Pull complete bd842a2105a8: Pull complete Digest: sha256:7a64bc9c8843b0a8c8b8a7e4715b7615e4e1b0d8ca3c7e7a76ec8250899c397a Status: Downloaded newer image for ubuntu:latest ---> 104bec311bcd Step 2 : RUN echo "My first Docker image" ---> Running in f85bd2e0f554 My first Docker image ---> 1d4302baa251 Removing intermediate container f85bd2e0f554 Successfully built 1d4302baa251
Breaking it down
- Docker will copy the directory's content into a temporary directory and use that as context.
STEP 1Docker will retrieve the latest
ubuntuimage from the public registry and all of its intermediate images (hold that thought!).
STEP 2Docker will run the
echocommand inside the container and route standard output and standard error to our machine so that we can see the result on our terminal.
Now look carefully. Have you noticed something?
My first Docker image ---> 1d4302baa251 <--------- THIS Removing intermediate container f85bd2e0f554 Successfully built 1d4302baa251 <--------- THIS
Every command, in the Dockerfile, that could potentially alter the state of the image (such as
RUN, since Docker cannot know if our command has side effects) produces an "intermediate image". That is, every time such a step is encountered an image will be created that holds the state produced by all the previous commands.
For example (assuming the file
FROM ubuntu WORKDIR /tmp COPY test.txt . RUN cat test.txt
Would produce 4 intermediate images.
Breaking it down
FROMcopies the existing
ubuntuimage into an intermediate image.
WORKDIRchanges the working directory to the given one.
COPYcopies the given file into the working directory.
RUNruns an arbitrary command.
Therefore the output of a build will be:
Sending build context to Docker daemon 3.072 kB Step 1 : FROM ubuntu ---> 104bec311bcd Step 2 : WORKDIR /tmp ---> Using cache ---> 8b7569f87645 Step 3 : COPY test.txt . ---> c515890976fb Removing intermediate container 7d07b7f6f0fb Step 4 : RUN cat test.txt ---> Running in 9ec4a66f5a05 I'm the content of test.txt ---> 27922b2708f1 Removing intermediate container 9ec4a66f5a05 Successfully built 27922b2708f1
Where every line starting with an arrow and ending with a hash (that nice little hex string) represents an intermediate image (e.g.
Running a container
You will notice something else while we are building our image:
Removing intermediate container 9ec4a66f5a05. That's right, to execute
RUN commands Docker needs to actually instantiate a container with the intermediate image up to that line of the Dockerfile and run the actual command. It will then "commit" the state of the container as a new intermediate image and continue the building process.
As you can see we've already run a container. Now we'll learn how to do that arbitrarily (and no, you don't need to build an image every time you need to run a container) and how to create an image from a container's state instead of a Dockerfile.
Take a note of the hash of the image we have just built (
27922b2708f1 in my case) and run a container based on that image with:
docker run 27922b2708f1
You'll notice that the output is blank. The container is, in fact, doing absolutely nothing. Why is that? Why isn't our
RUN command being...Well...Run?
The correct Dockerfile instruction to run a command when a container runs is
RUN which is, instead, executed only at build time. Let's change the Dockerfile as follows:
FROM ubuntu WORKDIR /tmp COPY test.txt . CMD cat test.txt
And build the image once more. You'll now notice that the output does not contain the content of
test.txt anymore. That's because
CMD does not run the command, but simply sets the container up to run that command every time it starts. The hash of the final image will also change, since the last step has changed.
To make things easier we can
tag our image so that we don't have to remember an ugly hash but assign a nice mnemonic we choose:
docker build -t test .
This will create an image named
test with tag
latest. The actual "tag" is the pair
<NAME>:<VERSION> which, in this case, will be
test:latest. A quick look at the cached images will validate this:
REPOSITORY TAG IMAGE ID CREATED SIZE test latest f2aba921b459 2 minutes ago 129 MB
If we now run a container from the aforementioned image with:
docker run test
The output will be:
I'm the content of test.txt
(or whatever you put in your test.txt)
Exactly what we were expecting.
How do I interact with a container though?
Fortunately Docker allows us to do that easily:
docker run -it test bash
Breaking it down
runtells Docker to run a container off the given image.
-ittells Docker to run the container in interactive mode (as opposed to
-d, daemon mode) and to give us a virtual terminal to interact.
-ialso takes care of wiring the standard input to the container (without which a terminal would be useless).
testis the image we've just built.
bashis the command to run once the container starts.
The command we pass is what keeps the container alive. Within the container it will be the process with PID 1 which will act as parent of all subsequent processes. This means that if such process is terminated the container will stop.
You will also notice that the content of
test.txt is no longer being printed. That happens because if a different command is passed, while running a container, the one from the
CMD instruction will be ignored. We are basically overriding the container's default command. To have a default command run no matter what you will have to use the
ENTRYPOINT instruction in your Docker file.
Once inside the container our prompt will look like the following:
Two things are important:
- The working directory is
/tmpas set with the
- The default (and only) user is
root. That's because containers are supposed to be treated as stateless processes, not fully-functional virtual machines, therefore there is no need to have users with different privileges. If a container is compromised it's enough to shut it down and spawn another one. Some images might use different users but that is definitely optional.
We will immediately see that the
ubuntu image is actually a stripped-down version of the Ubuntu server distribution. Many binaries will be missing:
root@f1e1064e0958:/tmp# ping bash: ping: command not found
We can use the preinstalled package manager to install
apt update apt install iputils-ping
To have a feeling of Docker's statelessness let's exit the container (
exit or Ctrl/Cmd+D) and start a new one:
root@f1e1064e0958:/tmp# exit ➜ docker-article docker run -it test bash root@ef95ac8b41ff:/tmp#
You will see the hostname has changed from
f1e1064e0958 (the ID of the old container) to
ef95ac8b41ff (Your IDs will be different). And, to our dismay,
ping is also gone.
Creating an image from a container
What if we want to "save" our changes to the container in a new image? Let's install the
iputils-ping package again and exit the container once more.
docker ps command will show currently running containers. Since all of our containers are stopped the list will be empty. To see all cached containers (including those who were stopped/terminated) simply use:
docker ps -a
My output looks like this:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES ef95ac8b41ff test "bash" 4 minutes ago Exited (127) 1 seconds ago hopeful_keller
Docker kindly assigns a random mnemonic to each new container (
hopeful_keller in my case) so that we don't have to remember the ID. We now want to use the state of this cached (stopped) container as a base for a new image we'll call
docker commit hopeful_keller test2
The output will be the SHA256 hash of the newly created image:
Our new image is not in the Docker engine's cache and we can see it by listing all images:
REPOSITORY TAG IMAGE ID CREATED SIZE test2 latest 8b976507170c About a minute ago 129 MB test latest f2aba921b459 About an hour ago 129 MB
That's neat! We have just created a completely new image that we can now reuse to run a container with
ping preinstalled. All in a matter of seconds. Let's run a container off
test2 and try pinging something to verify:
docker run -it test2 bash root@25d285662a5a:/tmp# ping hipolabs.com PING hipolabs.com (220.127.116.11) 56(84) bytes of data. 64 bytes from ec2-54-83-6-199.compute-1.amazonaws.com (18.104.22.168): icmp_seq=1 ttl=37 time=0.223 ms
As you can see Docker can "kind of" simulate the experience you would have with a virtual machine as long as you remember to commit your changes to an image every time. Alternatively (not suggested) you can keep starting the same container over and over again. From
docker ps -a you can get the ID/name of the stopped container and use:
docker start 25d285662a5a docker exec -it 25d285662a5a bash
To restart the same container with the same state. Docker also provides handy
unpause commands to achieve the same result you would have suspending a virtual machine. This goes a bit beyond the scope of this tutorial (and we still have a lot to cover).
The clear disadvantages of creating images this way instead of using a proper Dockerfile are:
- The image is multiple orders of magnitude bigger than a simple text file. Harder to share.
- There is no way to reliably document how you built your image. Equivalent to not sharing the source code.
In this article we saw:
- How to create a Dockerfile.
- How to build an image from a Dockerfile.
- How to start and interact with a container based on that image.
- How to create a new image from a container's state.
In the next chapter we will explore two awesome tools: Docker compose and Docker swarm. For questions and comments do not hesitate to get in touch!
Thanks to Fergal Walsh and Semih Basmacı for reading drafts of this article.