Introduction to Docker
Why Docker?
Docker is a widely adopted technology by companies and developers across the world. Containerization, particularly via Docker, has transformed into a commonplace methodology for packaging and deploying applications.
As of this writing, there are 172 official Docker images available on Docker Hub. The list includes images for Python, Node, Postgres, Nginx, Ubuntu, Wordpress, and many others. Not to mention the 8+ million public Docker images from the community. These images often form the base for how applications are containerized.
The official documentation from Docker is an excellent place to start your journey or to reference technical details. For a concise overview, this introduction provides a distilled breakdown of Docker fundamentals.
Docker Overview
Straight from the Docker overview — Docker is an open platform for developing, shipping, and running applications. The purpose is to separate applications from infrastructure to deliver software quickly. Ultimately, Docker allows you to build, ship, and run any application, anywhere.
Docker is an open-source project which automates the development, deployment, and runtime of applications inside isolated containers. Docker gives the tools to build, run, test, and deploy distributed applications based on Linux containers. The technology is both a daemon (a process running in the background) and a client command. It’s like a virtual machine but different in important ways.
A container provides an isolated environment to set up the dependencies needed to perform a particular task, such as running an API web service, executing a cron job process, serving machine learning models, performing an ETL for data processing, or reporting for analysis.
Nearly any type of program that can run on your computer can be run from a container. The container isolates the application, and the inner details are self-contained. Enabling greater flexibility, reusability, and portability.
Docker
Docker is a platform-as-service tool based on a virtualization concept called containers. Containers contain a standardized environment for executing code and run on top of a host operating system. In effect, any system running Docker can run a container, so the environment used inside the container can be selected without consideration to the host system e.g. an Alpine Linux container host can run on top of a Mac or Windows running Docker.
This feature makes containers a lightweight alternative to virtual machines. Containers are significantly faster to create and spin up than virtual machines while providing a similar level of isolation from the host system. Additionally, containers provide more reusability and portability to build and share.
In the software world, Docker allows several different types of applications to be run in separate containers with their own underlying environment, application version, libraries, and other utilities all from the same host machine.
Dockerfile
The Dockerfile is a starting point to define the dependencies in a container. It is a text file which can be built using Docker to create a Docker image (container image). A Dockerfile is described line-by-line as layers and can be based on existing images, like the ones available on Docker Hub. The build process reads the file from top-to-bottom where the environment and core dependencies should precede the application layers. The end result is an immutable package containing the environment, runtimes, libraries, and code needed to run a specific application.
Docker Images
The first step is to create a Dockerfile config to define a repeatable application environment. Then, build a Docker image from the Dockerfile config. Images are the build component of Docker — read-only templates that define an application environment to execute within a container.
There is a distinct difference between an image and container, and should not be used interchangeably as terms. A container image is the file-system-level object output from a Docker build. The image encapsulates the application. When an image is run, a container is created starting the application process.
Note: The image created by Docker isn’t actually a Docker-specific image. The image is related to the Open Container Initiative (OCI) — the open industry standard and formal specification for container image formats and runtimes.
Docker Containers
Containers are the run component of images. The container is an instantiated image which runs an application process. Bundle up an application with everything it needs, such as a Linux environment, libraries, and other dependencies, and ship it as one package. Containers can be run, started, stopped, deleted, and moved. In essence, a Docker container is akin to a cargo container with the contents required to run an application inside.
The portability of Docker containers allows them to run from essentially any server or host, as long as a Docker daemon or supported container runtime is running in the background. All of the major operating systems (Windows, macOS, Linux) today support running Docker and other alternative runtimes. Containers afford more isolation and resource control than regular Linux processes although less than virtual machines. The Docker engine makes use of Linux kernel features such as namespaces and control groups (cgroups) to achieve isolation and resource control.
The isolation and security allow for many containers to run simultaneously on a given host. Since containers are lightweight and self-contained with all components needed to run an application, there are no dependencies on the host other than Docker itself or an alternative runtime compatible with running containers from the container image.
In short, a container is a distributable unit capable of running applications on nearly any computer or server host with very little fuss.
Docker Examples
Without going too far into the weeds, let's walk through two Docker examples demonstrated in the Docker docs. The contents in the official documentation mag change over time, but the current examples are illustrative nonetheless.
The first is a Node sample application. Below is the corresponding Dockerfile used to build and run the Node application. Comments are added for clarification.
# syntax=docker/dockerfile:1
# 1) Set the base image to Node v12 distro on Alpine Linux
FROM node:12-alpine
# 2) Install the system libraries needed to run the application
RUN apk add --no-cache python2 g++ make
# 3) Set working directory to /app for isolation
WORKDIR /app
# 4) Copy local application contents into the /app directory
COPY . .
# 5) Install the Node application dependencies used for production
RUN yarn install --production
# 6) Lastly set the default execution when a container is run
CMD ["node", "src/index.js"]
With only six layers (instructions) in the Dockerfile we can build and run the Node application within an isolated container that contains all the necessary components for the app to run. At the time of this writing, building and running the example per the getting started docs is two simple commands.
docker build -t getting-started .
docker run -dp 3000:3000 getting-started
The next example to highlight (again at the time of this writing from the Docker docs) demonstrates a Python Flask web application similar to the Node example. The Dockerfile used to build and run the Flask application follows a slightly different pattern by installing the application libraries before copying over all of the application contents.
# syntax=docker/dockerfile:1
# 1) Set the base image to Python 3.8 distro on minimal Debian Linux
FROM python:3.8-slim-buster
# 3) Set working directory to /app for isolation
WORKDIR /app
# 4) Copy requirements.txt file
COPY requirements.txt requirements.txt
# 5) Install the Python dependencies defined in requirements.txt
RUN pip3 install -r requirements.txt
# 6) Copy source code and contents into the working directory
COPY . .
# 7) Lastly set the default execution when a container is run
CMD [ "python3", "-m" , "flask", "run", "--host=0.0.0.0"]
The important distinction to note in the Python Dockerfile is why the requirements.txt
file and Python library installation is listed before copying the source code and contents. As aforementioned, a Dockerfile is described line-by-line as layers (instructions or commands), and the build process reads the file from top-to-bottom. Docker caches each layer from a build, so subsequent builds re-use cached layers to reduce build time.
However, if there is a change within a given layer, Docker only uses the cache from preceding layers and all layers below are re-built.
For instance, from a clean slate, the very first build will freshly build all of the layers. On the next build, the cache generated from the previous build will be used rather than re-building each layer. If there is a change to the source code, only the layers preceding the source code command COPY . .
will use cache and layers 6 & 7 will be re-built. Builds are significantly faster using cached layers than building an image from scratch.
The general rule to follow is place the host or base image at the top of a Dockerfile followed by layers which remain mostly constant. Since application source code will change more frequently than, say, the Linux host, system libraries, or even application dependencies, the command to copy the source code and contents is listed as far down towards the bottom of the Dockerfile as possible in order to preserve the preceding cache layers.
Summary of Key Concepts
Docker
Docker is a software engine which enables the development, packaging, and deployment of applications and dependencies as containers. With Docker, you specify the dependencies your code needs, and Docker handles the build and execution capabilities to run the code in different environments. Moreover, Docker is a platform with substantial tooling to improve the developer experience in addition to the operations benefits. Docker provides the tooling and monolith platform to manage the lifecycle of the configuration, building, and running of images and containers.
Dockerfile
A Dockerfile is a descriptor text file composed of multiple layers defining the artifacts of a Docker Image. The Dockerfile is defined top to bottom with the steps for how an image should be built and its contents, including the container environment, library installations, dependencies, application code, and other elements that make up an image. Essentially a Dockerfile is the blueprint or recipe used to produce a container image when executing a build command.
Docker Images
A Docker image is a file or file-like artifact. After constructing a Dockerfile, a build command is executed to create an image based off the Dockerfile. The result is a template or set of instructions that can be used to run a container. Images are portable, reusable, and can be used as the starting points for other images i.e. as base images.
Docker Containers
A Docker container is a lightweight, standalone, executable package of software including everything needed to run an application: source code, dependencies, environment variables, system tools, system libraries, host, etc. A container is created when executing a run command of an image. Containers are intended to be easily disposable and recreatable. They are both robust and ephemeral.
In summation, Dockerfiles are blueprints for building container images which can be run to create new instances of containers. Applications (as defined by the contents of the Dockerfile) live and run inside the isolated container abstraction. That is the beauty of Docker, images, and containers.