Good practices: Minimum Docker Image Size

Docker container are becoming more and more popular over time. It's a really simple way to encapsulate your application code with its dependencies and leave it ready to be deployed into different environments.

One of its key features are that it allows you to test your application together with the operating system that will be used to deploy it. The implication of this is that your tests will not only validate that your app updates don't break your tests, but also make quite simple to upgrade the operating system dependencies and validate it with regular app tests.

Let's say we have a Python project which uses PostgreSQL with this set of dependencies in a requirements.txt file:

psycopg2==2.8.5
requests==2.23.0

A simple Dockerfile for that app would be:

FROM python:3.8-slim

WORKDIR /app

RUN apt-get update && apt-get install -y --allow-unauthenticated \
    build-essential \
    libpq-dev

COPY run_app.sh run_app.sh
COPY my_app.py my_app.py
COPY requirements.txt requirements.txt

# Install app dependencies.
RUN pip install -r requirements.txt

CMD ["/app/run_app.sh"]

Which would produce an image size of 357MB starting from a base Python 3.8 image of 113MB. That means that each deployment needs to download 357MB from your docker registry to your production environment. Imagine how big would it become as your dependencies grow, specially when you start using external libraries or modules that need to be compiled, requiring dev libraries to create the container.

Is there anything we could do to reduce that size?

Yeah, and since the multi stage functionality addition in Docker is quite simple!

The way to approach this optimisation is to use one container only for the build and dependencies installation and then, in a fresh and new container, just copy the files you are interested on.

The way you implement this split depends on the technologies being used to run the service, in our case for this article, we use a Python based application, so we need a way to easily copy the installed dependencies. There are two different ways to achieve it: * Use pip to install all dependencies into an specific directory. * Use a virtualenv so the whole installation is isolated, included bin scripts.

The approach used on this article is the virtualenv one, but it's just a personal preference for simplicity.

A possible implementation would have first, a base stage with the common rules reused in different stages:

FROM python:3.8-slim as base

# PYTHONUNBUFFERED: https://docs.python.org/3.8/using/cmdline.html#envvar-PYTHONUNBUFFERED
ENV PYTHONUNBUFFERED 1
ENV VIRTUAL_ENV /srv/venv
ENV PYTHONDONTWRITEBYTECODE 1

WORKDIR /app

RUN pip install virtualenv

RUN apt-get update && apt-get install -y --allow-unauthenticated \
    libpq5

It basically initialises the environment, sets the final workdir, installs the virtualenv package that will used in the following stages and finally installs library dependencies required by our app, in this case, libpq5 for PostgreSQL.

Next stage would be used only for the build, it will include all development dependencies. It will start from previous base stage:

FROM base as compile-image

RUN apt-get update && apt-get install -y --allow-unauthenticated \
    build-essential \
    libpq-dev && \
    python -m virtualenv ${VIRTUAL_ENV}

ENV PATH "${VIRTUAL_ENV}/bin:${PATH}"

# Install app dependencies.
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt

To install the runtime dependencies for our app, it creates a virtualenv, which is used later to install the requirements.txt dependencies with pip. To simplify the python command execution, the PATH environment variable is updated to use the virtual env's commands by default.

Finally, is time to create the final image with just our application code and the required final libraries:

FROM base as build-image
ENV PATH "${VIRTUAL_ENV}/bin:${PATH}"

COPY --from=compile-image ${VIRTUAL_ENV} ${VIRTUAL_ENV}
COPY run_app.sh run_app.sh
COPY my_app.py my_app.py

CMD ["/app/run_app.sh"]

As you can see, this stage is copying the virtual environment we created in the previous stage which would have the final installed modules required to run our app and the code for our app. The postgres dependency requires libpq library to work, but we already have it installed from the base image, which allows us to share that installation between the dependencies installation stage and this final container stage.

Once this last stage is built, we end having a container of 157MB which is near 2.30 times smaller than the original container.

social