Reducing Docker Image Size and Enhancing Build Efficiency

Certainly, optimizing Docker images is an essential practice for anyone looking to improve deployment speed, enhance scalability, and ultimately reduce infrastructure costs.

While the methods for achieving these efficiencies aren't particularly intricate, they often go unnoticed or unattended to in the hustle of software development cycles.

Given the tangible benefits of image optimization, it's worth having a checklist-style guide that one can refer to. This ensures that important steps are not glossed over in the process of containerization.

So, let's cut to the chase and delve into the various techniques that can help us streamline our Docker containers for optimal performance and resource utilization.

0. Get to know the status quo

First things first. Before optimizing, it would be good to know how we can check the current image size, such that we know from what state we are optimizing from.

This can easily be done via docker image ls my-image-name.

docker image ls blog-test
> REPOSITORY   TAG       IMAGE ID       CREATED         SIZE
> blog-test    latest    41c15fb7fff5   13 months ago   302MB

This prints out basic information about the image, as well as the size information we are looking for.

Now that we know our status-quo, let's get optimizing!

1. Use a Smaller Base Image

Switching to a lightweight base image is one of the most effective ways to reduce the size of our Docker images. A base image serves as the foundation on which our application runs, and it comes prepackaged with various libraries, utilities, and settings.

However, not all of these pre-included items are always necessary for every application. A full-fledged Ubuntu image, for instance, contains a wide variety of utilities and libraries aimed at general-purpose computing, but many of these components are likely irrelevant for a specific application, such as a Java service.

This excess baggage increases the size of our Docker image, which in turn leads to longer download times for deployments, more disk space usage, and potentially even increased security risks due to the larger attack surface.

# Before ...
FROM eclipse-temurin:17

# After
FROM eclipse-temurin:17-jre-alpine

Most lightweight images are based on the minimalistic Alpine Linux. The Alpine Linux base is designed to be small, fast, and efficient, resulting in a much smaller final image size.

2. Use Multi-Stage Builds

Separating build and runtime environments in Docker is an effective way to reduce image size and at the same time standardize the build process. Multi-stage builds let us accomplish this by allowing multiple intermediary build stages within a single Dockerfile.

In the first stage, we can use a heavier base image with all the build tools we need to compile our application. Then, we only copy the necessary files, like compiled binaries or assets, to a lighter, more focused base image in the second stage.

# Stage 1: Build the application
FROM node:18 AS build
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm install
COPY . .
RUN npm run build

# Stage 2: Serve the application
FROM nginx:alpine
COPY --from=build /app/dist /usr/share/nginx/html

As in the example, we could build our frontend assets using a Node.js image and then copy only those assets into a lightweight Nginx image for serving. This way, the final image contains only what's needed to run our application, making it smaller, faster, and more secure.

3. Minimize Layers

With Docker, each command in a Dockerfile contributes to the creation of a new layer in the image. These layers are stacked on top of each other, and each layer represents a filesystem delta—essentially a set of changes or additions to the previous layer.

While layering is powerful for caching and incremental image updates, unnecessary layers can bloat the image and slow down the build, push, and pull processes. Therefore, reducing the number of layers in our image can be a key optimization technique.

# Before ...
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y xz-utils

# After ...
RUN apt-get update && \
    apt-get install -y curl xz-utils

One common approach to minimize layers is by combining multiple RUN statements into a single one. Instead of having several RUN commands each doing one thing, we can aggregate them into one command by chaining shell commands together using &&.

4. Remove Unnecessary Files

The size of a Docker image can also be greatly reduced by cleaning up temporary and unnecessary files, which are often created during package installations or other setup tasks.

This is particularly important for operations like installing packages, as package managers often keep a cache of downloaded packages which, while useful for speeding up local development, adds unnecessary bloat to a Docker image.

RUN apt-get update && \
    apt-get install -y curl xz-utils && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

In this modified RUN command, apt-get clean is used to remove downloaded archive files, and rm -rf /var/lib/apt/lists/* deletes the package list files, ensuring that they don't stick around in a layer in the built image.

5. Use Static Binaries

Compiled languages like Go have the capability to generate static binaries that bundle all the libraries and dependencies needed to run the application.

This feature allows us to create extremely lightweight Docker images, as we don't need an operating system or runtime libraries to run the application — we only need the compiled binary itself.

FROM golang:alpine AS build
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 go build -o /my-app

FROM scratch
COPY --from=build /my-app /my-app
CMD ["/my-app"]

The scratch image provided by Docker is a minimal, empty image that serves as a base for such statically compiled applications. It literally starts "from scratch," containing only an empty filesystem.

6. Optimize Caching

Docker's layer caching mechanism can be a powerful ally in speeding up build times, but it requires some strategic planning in our Dockerfile to make the most of it.

Each instruction in a Dockerfile creates a new layer, and Docker caches these layers to avoid redundant work in future builds. If a layer and all its preceding layers are unchanged, Docker simply reuses the cached layer, thereby speeding up the build process.

However, if a layer changes, all subsequent layers have to be recreated, invalidating the cache for them. To optimize for this behavior, we should place instructions that are less likely to change before those that are more likely to change. This allows Docker to reuse the maximum number of cached layers.

FROM node:18

WORKDIR /app

# Copy package files and install dependencies (less likely to change often)
COPY package*.json ./
RUN npm install

# Copy the rest of the codebase (more likely to change often)
COPY . .

# ... (rest of the Dockerfile)

In this example, the npm install step is executed only if package.json or package-lock.json changes. If they remain the same, Docker will reuse the cached layer, and the expensive operation of installing dependencies is skipped, speeding up the build process.

The application code, which changes more frequently, is copied after the dependencies, ensuring that changes to the code don't invalidate the cache for the package installation layer.

This optimized ordering allows for more efficient use of Docker's caching mechanism

7. Compress Artifacts

Compressing application artifacts is another strategy to reduce the size of our Docker images. Especially in cases where the application generates large files or includes sizable assets, compression can make a notable difference.

Here's a simple example to demonstrate this technique:

Let's assume we have a directory called large-assets that we want to include in our Docker image. We could compress this directory before adding it to our Docker image.

On a Unix-based system, we could run:

tar -zcvf large-assets.tar.gz large-assets/

This will create a compressed file large-assets.tar.gz using gzip compression.

Then, in our Dockerfile, we can copy this compressed file and then decompress it:

FROM ubuntu:latest

# Copy the compressed file into the image
COPY large-assets.tar.gz /app/large-assets.tar.gz

# Navigate to /app and decompress the file
WORKDIR /app

# Unpack the compressed file and remove the archive
RUN tar -zxvf large-assets.tar.gz && rm large-assets.tar.gz

# ... (rest of the Dockerfile)

By compressing the directory and then decompressing it within the Docker image, we minimize the amount of data that needs to be copied during the image build process, thus reducing the final image size.

Keep in mind that this technique is most effective when the size of the uncompressed files is significantly larger than the compressed archive and when those files are not needed during the build process.

8. Use .dockerignore

Creating a .dockerignore file in the context directory can significantly help in optimizing the Docker image by ensuring that unnecessary files are not sent to the Docker daemon during the build process.

By excluding files that aren't required for building the image, we not only reduce the size of the image but also speed up the build process.

git/
node_modules/
Dockerfile
.dockerignore
*.md
tests/
*.log
tmp/

This is particularly useful for larger codebases or when we have files and directories that are large but not needed in the Docker image, such as test folders, .git directories, or temporary build artifacts.

9. Use Docker BuildKit

Docker's BuildKit is a modern build subsystem designed to improve speed, accuracy, and security in building Docker images.

It offers several advanced features, such as more efficient layer caching, parallelized build steps, and mount points for caching dependencies between builds.

While some of these features mainly impact build speed rather than the final image size, efficient caching can still contribute to more streamlined development and deployment workflows.

To enable Docker BuildKit, we can set the environment variable DOCKER_BUILDKIT=1 before running our docker build commands:

export DOCKER_BUILDKIT=1
docker build .

One of the powerful features in BuildKit is cache import/export. This is particularly useful for CI/CD pipelines, where we might not have the luxury of layer caching because we could be building on a fresh machine every time.

BuildKit allows us to specify external cache sources that it can use to speed up builds.

For example:

export DOCKER_BUILDKIT=1

# Exporting cache
docker build --build-arg BUILDKIT_INLINE_CACHE=1 --cache-to=type=registry,ref=myrepo/myimage:buildcache .

# Importing cache
docker build --cache-from=type=registry,ref=myrepo/myimage:buildcache .

Here, the --cache-to and --cache-from options specify a cache source (in this case, a Docker registry) to pull from and push to. This can greatly speed up our build times by reusing layers from previous builds, even on new or ephemeral build machines.

While the primary benefit of these features is to reduce build times, they can indirectly lead to smaller image sizes as well. By making the build process more efficient, developers are encouraged to make frequent, smaller changes, which can lead to leaner, more optimized images over time.

10. Leverage Service Configuration Instead of Image Configuration

Leveraging service configurations at runtime instead of baking them into the Docker image provides greater flexibility and promotes the reusability of our images.

By keeping the image agnostic of specific environments (like development, staging, production, etc.), we can reuse the same image across different contexts and configurations.

While this doesn't dramatically reduce the image size, it does streamline application deployment and scaling, which are also important aspects of container optimization.

For example, instead of adding a configuration file with database credentials directly into the image, we can pass these as environment variables at the time the container is run:

docker run -e DB_HOST='database_host' -e DB_USER='username' -e DB_PASS='password' my-docker-image

Alternatively, we can use Docker Compose or Kubernetes ConfigMaps to mount configuration files at runtime. This keeps our configurations separate from our image, allowing us to update settings without having to rebuild the entire image.

services:
  my-service:
    image: my-docker-image
    volumes:
      - ./config:/app/config

In this example, a config directory on the host machine is mounted into the /app/config directory in the container. Any configuration files in the host’s config directory would be accessible to the application running in the container, enabling on-the-fly adjustments without necessitating a rebuild of the Docker image.

This makes our Docker images more versatile and simplifies the process of managing configurations across various environments.

11. Use the --squash Flag During Build

The --squash flag is a Docker build option that allows us to merge all the layers created during the build process into a single new layer.

This can be particularly helpful for reducing the image size because it removes all the intermediate layers and data that aren't needed in the final image.

To use this feature, we'll run our docker build command like this:

docker build --squash -t my-squashed-image .

However, there's a trade-off involved when using the --squash flag. Squashing layers can negate some of the caching benefits that Docker provides.

Normally, Docker caches each layer separately, so if we make a small change to our application code, only the layers that have changed need to be rebuilt, making subsequent builds much faster.

When we squash layers, we effectively create a new layer that replaces all the existing ones, and this can invalidate the cache for those steps in the build process. As a result, the next time we build the image, it might take longer because Docker can't use the cached layers.

So, while the --squash flag can help reduce the image size, we should consider whether the trade-off in caching benefits is worth it for our particular use case.

It might be more beneficial for production images, where size matters most and the image won't be rebuilt frequently, as opposed to development images, where we might prefer to retain the caching benefits for quicker iteration.

Fin

Optimizing Docker images is a multi-faceted endeavor that balances image size, build speed, and flexibility. From choosing a lightweight base image and employing multi-stage builds to utilizing advanced features like Docker BuildKit, there are a multitude of strategies to streamline our containers.

It's not just about reducing disk usage, optimized images are quicker to deploy, more cost-effective to run, and easier to manage, providing tangible benefits across our development and operational workflows.