The Pantsbuild community's top-voted priority from January’s 2022 Community Roadmap Survey is our redesign of Python lockfiles. So we’re excited to announce that Pants 2.10 introduces support for multiple lockfiles!
What's a lockfile?
A lockfile pins every dependency your project installs, including both your direct dependencies and their own dependencies, i.e. "transitive" dependencies.
For example, if your project only uses
freezegun, a lockfile might look like:
freezegun==1.2.0 \ --hash=sha256:93e90676da3... \ --hash=sha256:e19563d0b05... python-dateutil==2.8.2 \ --hash=sha256:0123cacc162... \ --hash=sha256:961d03dc345... six==1.16.0 \ --hash=sha256:8abb2f1d868... \ --hash=sha256:1e61c37477a...
- Stability: by locking down every dependency, you don't have to worry about waking up to your build breaking because of a new version of a dependency being released the night before.
- Reproducibility: such as being able to go back to older versions of your project and using the same dependencies as before.
- Security: checksum validation ensures that the artifacts you download are what you expect, reducing the risk of supply chain attacks.
Often, the optimal approach is to use only one lockfile for your whole repo. Some benefits can include:
- Simplicity. For example, ensuring that you're using up-to-date dependencies in all projects at once.
- Compatibility. Such as code sharing across multiple projects without worrying about dependency conflicts.
- Performance. You only need to resolve one time to work in the whole repository.
However, as projects grow—and especially as you mix distinct projects in a single monorepo—you may need the flexibility of multiple lockfiles. For example, if one project wants to migrate to Django 4 but the rest of your projects are not yet able to upgrade.
Typical approach: project-based lockfiles
Several Python tools like Poetry allow you to have one lockfile per project, e.g.:
├── data_science │ ├── poetry.lock │ └── pyproject.toml ├── web_app1 │ ├── poetry.lock │ └── pyproject.toml └── web_app2 ├── poetry.lock └── pyproject.toml
While this gives you local flexibility, it reduces both performance and compatibility by forcing you into one unique resolve per project. For example, if you have five Django projects and four of them should still be using Django 3, you have to duplicate that version. In addition to it being more work to maintain and update multiple resolves, there can be a performance hit to install each distinct resolve.
Per-project resolves can especially make inter-project code sharing more complicated, as you have to make sure each project uses compatible dependencies: aka "dependency hell".
Pants's hybrid approach: granular "resolves"
Rather than forcing global or per-project lockfiles, Pants uses a hybrid approach. Lockfiles are named, and can be used on a per-directory or even per-file basis. This allows a repo to operate with the minimum number of resolves required to support their conflicting library versions, without necessarily going to the extreme of per-project resolves.
First, you define several "resolves", which are logical names for lockfiles.
[python.resolves] data-science = "3rdparty/data_science.lock.txt" web-app = "3rdparty/web_app.lock.txt"
Then, you declare which resolves particular requirements should belong to, if they don't use your default. Pants is able to parse Poetry's
pyproject.toml and pip's
requirements.txt; often you will tell Pants that that entire file belongs to a particular resolve. But you can also get more granular, such as declaring a particular requirement to be used in multiple resolves.
Finally, you declare which resolve particular code, tests, binaries etc should use, if they don't use your default.
pex_binary( name="app", entry_point="gunicorn.py", resolve="web-app", )
Pants will only infer dependencies on code using the same resolve, making sure you use a consistent lockfile. For example, if
helpers.py doesn't work with the
data-science resolve, Pants won't let your
data_science_app.py file incorrectly use it.
It's possible for requirements and source code to work with multiple resolves. For example, you can have a file like
utils.py that works with all your lockfiles.
# src/py/utils/BUILD python_sources( # The files in this folder can be used with both resolves. # # This `parametrize` mechanism is added in Pants 2.11. In # Pants 2.10, you can explicitly create multiple targets. resolve=parametrize("data-science", "web-app"), )
This granularity allows you to have the minimum number of resolves necessary, while ergonomically sharing common code across multiple projects.
Lockfile support in Pants 2.10
We're excited that Pants 2.10 releases full support for multiple lockfiles.
However, one major caveat: Pants's automated lockfile generation currently has several limitations. For now, some users may need to manually generate their locks.
We've addressed these issues with lockfile generation by teaching the Pex tool how to generate lockfiles using
pip, which will be launched in the subsequent Pants 2.11 release. Some benefits we're excited about:
- Can generate a lockfile for multiple interpreter constraints and platforms (e.g. macOS vs Linux), unlike pip-compile (but similar to Poetry).
- Can lock VCS/git requirements, which you cannot normally do with
pipbecause it doens't know how to
- Handles pip-specific mechanisms like
- Promising preliminary benchmarks for both lock generation & lock installation speed.