Multiple lockfiles in Python repos

Multiple lockfiles in Python repos

The Pantsbuild community's top-voted priority from January’s 2022 Community Roadmap Survey is our redesign of Python lockfiles. So we’re excited to announce that Pants 2.10 introduces support for multiple lockfiles!

What's a lockfile?

A lockfile pins every dependency your project installs, including both your direct dependencies and their own dependencies, i.e. "transitive" dependencies.

For example, if your project only uses freezegun, a lockfile might look like:

freezegun==1.2.0 \
  --hash=sha256:93e90676da3... \
  --hash=sha256:e19563d0b05...
python-dateutil==2.8.2 \ 
  --hash=sha256:0123cacc162... \
  --hash=sha256:961d03dc345...
six==1.16.0 \
  --hash=sha256:8abb2f1d868... \
  --hash=sha256:1e61c37477a...

Why lockfiles?

  • Stability: by locking down every dependency, you don't have to worry about waking up to your build breaking because of a new version of a dependency being released the night before.
  • Reproducibility: such as being able to go back to older versions of your project and using the same dependencies as before.
  • Security: checksum validation ensures that the artifacts you download are what you expect, reducing the risk of supply chain attacks.

Multiple Lockfiles

Often, the optimal approach is to use only one lockfile for your whole repo. Some benefits can include:

  • Simplicity. For example, ensuring that you're using up-to-date dependencies in all projects at once.
  • Compatibility. Such as code sharing across multiple projects without worrying about dependency conflicts.
  • Performance. You only need to resolve one time to work in the whole repository.

However, as projects grow—and especially as you mix distinct projects in a single monorepo—you may need the flexibility of multiple lockfiles. For example, if one project wants to migrate to Django 4 but the rest of your projects are not yet able to upgrade.

Typical approach: project-based lockfiles

Several Python tools like Poetry allow you to have one lockfile per project, e.g.:

├── data_science
│   ├── poetry.lock
│   └── pyproject.toml
├── web_app1
│   ├── poetry.lock
│   └── pyproject.toml
└── web_app2
    ├── poetry.lock
    └── pyproject.toml

While this gives you local flexibility, it reduces both performance and compatibility by forcing you into one unique resolve per project. For example, if you have five Django projects and four of them should still be using Django 3, you have to duplicate that version. In addition to it being more work to maintain and update multiple resolves, there can be a performance hit to install each distinct resolve.

Per-project resolves can especially make inter-project code sharing more complicated, as you have to make sure each project uses compatible dependencies: aka "dependency hell".

Dependency hell.

Pants's hybrid approach: granular "resolves"

Rather than forcing global or per-project lockfiles, Pants uses a hybrid approach. Lockfiles are named, and can be used on a per-directory or even per-file basis. This allows a repo to operate with the minimum number of resolves required to support their conflicting library versions, without necessarily going to the extreme of per-project resolves.

First, you define several "resolves", which are logical names for lockfiles.

[python.resolves]
data-science = "3rdparty/data_science.lock.txt"
web-app = "3rdparty/web_app.lock.txt"

Then, you declare which resolves particular requirements should belong to, if they don't use your default. Pants is able to parse Poetry's pyproject.toml and pip's requirements.txt; often you will tell Pants that that entire file belongs to a particular resolve. But you can also get more granular, such as declaring a particular requirement to be used in multiple resolves.

Finally, you declare which resolve particular code, tests, binaries etc should use, if they don't use your default.

pex_binary(
    name="app",
    entry_point="gunicorn.py",
    resolve="web-app",
)

Pants will only infer dependencies on code using the same resolve, making sure you use a consistent lockfile. For example, if helpers.py doesn't work with the data-science resolve, Pants won't let your data_science_app.py file incorrectly use it.

It's possible for requirements and source code to work with multiple resolves. For example, you can have a file like utils.py that works with all your lockfiles.

# src/py/utils/BUILD

python_sources(
    # The files in this folder can be used with both resolves.
    #
    # This `parametrize` mechanism is added in Pants 2.11. In
    # Pants 2.10, you can explicitly create multiple targets.
    resolve=parametrize("data-science", "web-app"),
)

This granularity allows you to have the minimum number of resolves necessary, while ergonomically sharing common code across multiple projects.

Lockfile support in Pants 2.10

We're excited that Pants 2.10 releases full support for multiple lockfiles.

However, one major caveat: Pants's automated lockfile generation currently has several limitations. For now, some users may need to manually generate their locks.

We've addressed these issues with lockfile generation by teaching the Pex tool how to generate lockfiles using pip, which will be launched in the subsequent Pants 2.11 release. Some benefits we're excited about:

  • Can generate a lockfile for multiple interpreter constraints and platforms (e.g. macOS vs Linux), unlike pip-compile (but similar to Poetry).
  • Can lock VCS/git requirements, which you cannot normally do with pip because it doens't know how to --hash the repository.
  • Handles pip-specific mechanisms like --find-links (unlike Poetry).
  • Promising preliminary benchmarks for both lock generation & lock installation speed.

Trying out Pants

Check out our example-python repository to try out Pants's new lockfile support. Let us know what you think in Slack!

Come say hi!

Pants community welcomes newcomers, and Pants development is guided by your feedback.

Click here to join the community chat on Slack



Show Comments