Optimizing Python + Docker deploys using Pants
Pants can build a PEX file, an executable zip file containing your Python code and all transitive dependencies. Deploying your application is as simple as copying the file. This post elaborates on how to get best performance out of the powerful combination of Pants+PEX+Docker.
The Pantsbuild system ships with support for building an all-in-one distributable Python file called a PEX. A PEX file is an executable zip file containing your Python code and all its transitive dependencies. Deploying your Python application is as simple as copying a PEX file to a system or image with a suitable Python interpreter.
Pants also supports building Docker images and embedding code it packages into those images. With the combination of PEX+Docker, Pants allows you to easily containerize your Python application with minimal boilerplate.
This post builds off of our previous post about Pants + PEX + Docker, and elaborates on how to squeeze the best build-time performance out of this powerful combination.
A very simple example
The example BUILD file above demonstrates how simple Pants makes building a docker image containing a Python application.
In production environments, however, simplicity at build-time comes with trade-offs. Since a PEX is meant to be an all-in-one distributable, it has a few thorns when used as a container's entrypoint.
- It must extract itself before it can run, increasing application startup times
- After extraction, your running container has the PEX and the extracted contents on disk, increasing space required to run
- Changing a first-party source file requires a full rebuild of the PEX and the container, which doesn’t leverage Pants or Docker’s caches
Using some metrics from a real-world use case (a PEX with 56 third-party requirements, a few large assets, and about a hundred source files) using DOCKER_BUILDKIT set to 1:
Touching a 1stparty source results in similar metrics. I.e., we get no incrementality.
Simple Multi-Stage Build
We can leverage Docker multi-stage builds and interesting PEX features to solve some of these challenges. Using this recipe in PEX’s documentation, we can:
- Create a Python virtual environment containing only our third-party dependencies in one stage
- Create an identical virtual environment containing only our first-party sources in another stage
- COPY them both onto our final “production image” stage
Our BUILD file becomes:
This approach has several benefits:
- It moves the extraction of the PEX to “build time” and also compiles each Python file, so that the application has the lowest-possible startup latency
- Running the final image doesn’t require any additional space
COPY --from=depsinstruction in the final image can be cached between runs when only touching first-party code
It also has some drawbacks:
- Even though the final layer of the deps stage is re-used if deps don’t change, the input PEX has changed, and so docker must still re-RUN the extraction
- Having docker pre-extracting the PEX incurs extra build time
If we touch a first-party source (leveraging Docker’s layer caches) here’s what changes:
Multi-stage build leveraging 2 PEXs
In order to fully leverage Pants and Docker caches, we can split our all-in-one PEX into two: one for transitive third-party dependencies and one for first-party code.
If we touch a first-party source (leveraging Pants and Docker’s layer caches) here’s what changes:
Multiple Images and tagging
This approach leads to significant speedup in both the cold and warm build times, but it can be improved even further:
- The important layers that allow for a faster incremental build aren’t in an image, so they will get cleaned when docker is pruned
- If we tell Pants to tag our image, docker will tag the intermediate images before tagging the final one, leading to two
<none>-tagged images in addition to the final, tagged, image
We can fix this by using several
docker_image targets in tandem:
Pants will build the dependent images before building the final one. (Note the registry value “companyname” can be any string (or set the registry to an empty list), we just need to hardcode something that we can reference in the final image's COPY instructions).
Now after building the image, we are free to prune untagged images and layers.
There are avenues to squeezing even more performance out of this approach, such as declaring our large assets as files, and using a dedicated stage to COPY them into the virtual environment.
Additionally, the above targets can all be wrapped into a handy Pants macro for simplicity and re-usability.
Pants’ support for Python and Docker is well-equipped to easily cater to varying business needs and interesting use-cases. And although support for everything listed here is primed and ready as of the upcoming Pants 2.13 release, the Pants community hopes to make even the most complex use-case (such as the final example) as simple to declare as the straightforward use-case (such as the first example) in future versions.
If you want to learn more about Pants, PEX and how they can help you deploy Python applications efficiently, come and say hi on Slack!