Docker container are becoming more and more popular over time. It's a really simple way to
encapsulate your application code with its dependencies and leave it ready to be deployed into
different environments.
One of its key features are that it allows you to test your application together with the operating
system that will be used to deploy it. The implication of this is that your tests will not only
validate that your app updates don't break your tests, but also make quite simple to upgrade the
operating system dependencies and validate it with regular app tests.
Let's say we have a Python project which uses PostgreSQL with this set of dependencies in a
requirements.txt
file:
psycopg2==2.8.5
requests==2.23.0
A simple Dockerfile
for that app would be:
FROM python:3.8-slim
WORKDIR /app
RUN apt-get update && apt-get install -y --allow-unauthenticated \
build-essential \
libpq-dev
COPY run_app.sh run_app.sh
COPY my_app.py my_app.py
COPY requirements.txt requirements.txt
# Install app dependencies.
RUN pip install -r requirements.txt
CMD ["/app/run_app.sh"]
Which would produce an image size of 357MB starting from a base Python 3.8 image of 113MB. That
means that each deployment needs to download 357MB from your docker registry to your production
environment. Imagine how big would it become as your dependencies grow, specially when you start
using external libraries or modules that need to be compiled, requiring dev libraries to create
the container.
Is there anything we could do to reduce that size?
Yeah, and since the multi stage
functionality addition in Docker is quite simple!
The way to approach this optimisation is to use one container only for the build and dependencies
installation and then, in a fresh and new container, just copy the files you are interested on.
The way you implement this split depends on the technologies being used to run the service, in our
case for this article, we use a Python based application, so we need a way to easily copy the
installed dependencies. There are two different ways to achieve it:
* Use pip to install all dependencies into an specific directory.
* Use a virtualenv so the whole installation is isolated, included bin scripts.
The approach used on this article is the virtualenv one, but it's just a personal preference for
simplicity.
A possible implementation would have first, a base
stage with the common rules reused in different
stages:
FROM python:3.8-slim as base
# PYTHONUNBUFFERED: https://docs.python.org/3.8/using/cmdline.html#envvar-PYTHONUNBUFFERED
ENV PYTHONUNBUFFERED 1
ENV VIRTUAL_ENV /srv/venv
ENV PYTHONDONTWRITEBYTECODE 1
WORKDIR /app
RUN pip install virtualenv
RUN apt-get update && apt-get install -y --allow-unauthenticated \
libpq5
It basically initialises the environment, sets the final workdir, installs the virtualenv
package that will used in the following stages and finally installs library dependencies required
by our app, in this case, libpq5 for PostgreSQL.
Next stage would be used only for the build, it will include all development dependencies. It will
start from previous base
stage:
FROM base as compile-image
RUN apt-get update && apt-get install -y --allow-unauthenticated \
build-essential \
libpq-dev && \
python -m virtualenv ${VIRTUAL_ENV}
ENV PATH "${VIRTUAL_ENV}/bin:${PATH}"
# Install app dependencies.
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
To install the runtime dependencies for our app, it creates a virtualenv, which is used later
to install the requirements.txt dependencies with pip. To simplify the python command execution,
the PATH
environment variable is updated to use the virtual env's commands by default.
Finally, is time to create the final image with just our application code and the required final
libraries:
FROM base as build-image
ENV PATH "${VIRTUAL_ENV}/bin:${PATH}"
COPY --from=compile-image ${VIRTUAL_ENV} ${VIRTUAL_ENV}
COPY run_app.sh run_app.sh
COPY my_app.py my_app.py
CMD ["/app/run_app.sh"]
As you can see, this stage is copying the virtual environment we created in the previous stage
which would have the final installed modules required to run our app and the code for our app.
The postgres dependency requires libpq library to work, but we already have it installed from
the base image, which allows us to share that installation between the dependencies installation
stage and this final container stage.
Once this last stage is built, we end having a container of 157MB which is near 2.30 times smaller
than the original container.