5 Pytest Best Practices for Writing Great Python Tests

July 16, 2020
Will Yahoo Breach Compromise Credit Cards? Probably Not

Pytest has a lot of features, but not many best-practice guides. Here’s a list of the 5 most impactful best-practices we’ve discovered at NerdWallet.



At NerdWallet, we have a lot of production python code – and a lot of it is tested using the great pytest open source framework. The framework has some excellent documentation explaining its features; one thing that I found lacking was a quickstart guide on best practices – which of these features to use, and when.

I observed engineers new to Python or pytest struggling to use the various pieces of pytest together, and we would cover the same themes in Pull Request reviews during the first quarter. It wasn’t just new engineers either: I found that experienced engineers were also sticking with unittest and were anxious about switching over to pytest because there were so many features and little guidance.

Documentation scales better than people, so I wrote up a small opinionated guide internally with a list of pytest patterns and antipatterns; in this post, I’ll share the 5 that were most impactful.

  1. Prefer mocker over mock
  2. Parametrize the same behavior, have different tests for different behaviors
  3. Don’t modify fixture values in other fixtures
  4. Prefer responses over mocking outbound HTTP requests
  5. Prefer tmpdir over global test artifacts

Getting Started with Pytest

If you’re new to pytest, it’s worth doing a quick introduction. pytest is two things: on the surface, it is a test runner which can run existing unittests, and in truth it’s a different testing paradigm. While something like rspec in Ruby is so obviously a different language, pytest tends to be more subtle. It is more similar to JUnit in that the test looks mostly like python code, with some special pytest directives thrown around.

The internet already does a good job of introducing new folks to pytest:

  • Here’s a great, low-fi way to go from 0 to 1
  • Here’s a more thoughtful article with a little context around best-practices for maintainability

Read through those to get introduced to the key concepts and play around with the basics before moving along.

Quick Primer of Key Concepts

The two most important concepts in pytest are fixtures and the ability to parametrize; an auxiliary concept is how these are processed together and interact as part of running a test.

Fixtures

Fixtures are how test setups (and any other helpers) are shared between tests. While we can use plain functions and variables as helpers, fixtures are super-powered with functionality, including:

  • The ability to depend on and build on top of each other to model complex functionality
  • The ability to customize this functionality by overriding fixtures at various levels
  • The ability to parametrize (that is, take on multiple values) and magically run every dependent test once for each parameterized value

tl;dr: Fixtures are the basic building blocks that unlock the full power of pytest.

yield Fixtures

Some of the most useful fixtures tend to be context fixtures, or yield fixtures. These are very similar in syntax to a context created with contextlib.contextmanager:

The code before the yield is executed as setup for the fixture, while the code after the yield is executed as clean-up. The value yielded is the fixture value received by the user.

Like all contexts, when yield fixtures depend on each other they are entered and exited in stack, or Last In First Out (LIFO) order. That is, the last fixture to be entered is the first to be exited.

Fixture Resolution

When a test is found, all the fixtures involved in this test are resolved by traversing the dependency chain upwards to the parent(s) of a fixture. When this Directed Acyclic Graph (DAG) has been resolved, each fixture that requires execution is run once; its value is stored and used to compute the dependent fixture and so on. If the fixture dependency has a loop, an error occurs.

Fixture Overriding

One of the most useful (and most frequently used) features of fixtures is the ability to override them at various levels.

It’s not just the end fixtures that can be overridden! Something that’s not obvious and frequently more useful is to override fixtures that other fixtures depend on. This is very useful to create high-leverage fixtures that can be customized for different end-tests.

Parametrize

Parametrizing tests and fixtures allows us to generate multiple copies of them easily. Notice in the example below that there is one test written, but pytest actually reports that three tests were run.

Parametrizing tests has an obvious use: to test multiple inputs to a function and verify that they return the expected output. It’s really useful to thoroughly test edge cases.

Parametrizing fixtures is subtly different, incredibly powerful, and a more advanced pattern. It models that wherever the fixture is used, all parametrized values can be used interchangeably. Parametrizing a fixture indirectly parametrizes every dependent fixture and function.

Lifecycle of a Test Run

There are two major stages to every test run – collection and execution.

Collection

During test collection, every test module, test class, and test function that matches certain conditions is picked up and added to a list of candidate tests. In parallel, every fixture is also parsed by inspecting conftest.py files as well as test modules. Finally, parametrization rules are applied to generate the final list of functions, and their argument (and fixture) values.

In this phase, the test files are imported and parsed; however, only the meta-programming code – i.e, the code the operates on fixtures and functions – is actually executed. The idea here is that pytest needs to understand all the tests that actually have to execute, and it can’t do that without executing things like parametrize.

For pytest to resolve and collect all the combinations of fixtures in tests, it needs to resolve the fixture DAG. Therefore, the inter-fixture dependencies are resolved at collection time but none of the fixtures themselves are executed.

By default, errors during collection will cause the test run to abort without actually executing any tests.

Execution

After test collection has concluded successfully, all collected tests are run. But before the actual test code is run, the fixture code is first executed, in order, from the root of the DAG to the end fixtures:

  • session scoped fixtures are executed if they have not already been executed in this test run. Otherwise, the results of previous execution are used.
  • module scoped fixtures are executed if they have not already been executed as part of this test module in this test run. Otherwise, the results of previous execution are used.
  • class scoped fixtures are executed if they have not already been executed as part of this class in this test run. Otherwise, the results of previous execution are used.
  • function scoped fixtures are executed.

Finally, the test function is called with the values for the fixtures filled in. Note that the parametrized arguments have already been “filled in” as part of collection.

Patterns and Anti-Patterns

Now that we have the basic concepts squared away, let’s get down to the 5 best practices as promised! As a quick reminder, these are:

  1. Prefer mocker over mock
  2. Parametrize the same behavior, have different tests for different behaviors
  3. Don’t modify fixture values in other fixtures
  4. Prefer responses over mocking outbound HTTP requests
  5. Prefer tmpdir over global test artifacts

Prefer mocker over mock

tl;dr: Use the mocker fixture instead of using mock directly.

Why:

  • Eliminates the chance of flaky test due to “mock leak”, when a test does not reset a patch.
  • Less boilerplate, and works better with parametrized functions and fixtures.

Parametrize the same behavior, have different tests for different behaviors

tl;dr: Parametrize when asserting the same behavior with various inputs and expected outputs. Make separate tests for distinct behaviors. Use ids to describe individual test cases.

Why:

  • Copy-pasting code in multiple tests increases boilerplate – use parametrize.
  • Never loop over test cases inside a test – it stops on first failure and gives less information than running all test cases.
  • Parametrizing all invocations of a function leads to complex arguments and branches in test code. This is difficult to maintain, and can lead to bugs.

Don’t modify fixture values in other fixtures

tl;dr: Modify and build on top of fixture values in tests; never modify a fixture value in another fixture – use deepcopy instead.

Why: For a given test, fixtures are executed only once. However, multiple fixtures may depend on the same upstream fixture. If any one of these modifies the upstream fixture’s value, all others will also see the modified value; this will lead to unexpected behavior.

Prefer responses over mocking outbound HTTP requests

tl;dr: Never manually create Response objects for tests; instead use the responses library to define what the expected raw API response is.

Why: When integrating against an API, developers are already thinking of sample raw responses. Expecting a developer to make the cognitive switch from this to how a Response is created is unnecessary. Using the responses library, test can define their expected API behavior without the chore of creating the response. It also has the advantage of mocking fewer things, which leads to more actual code being tested.

Examples: The responses library has a solid README with usage examples, please check it out.

Note: This only works for calls made via the (incredibly popular) requests library. You could use httpretty instead – this patches at the socket layer and therefore works with any HTTP client, not just requests

Prefer tmpdir over global test artifacts

tl;dr: Don’t create files in a global tests/artifacts directory for every test that needs a file-system interface. Instead, use the tmpdir fixture to create files on-the-fly and pass those in.

Why: Global artifacts are removed from the tests that use them, which makes them difficult to maintain. They’re also static and can’t leverage fixtures and other great techniques. Creating files from fixture data just before a test is run provides a cleaner dev experience.

Bonus: A Word of Caution

These best-practices tell you how to write tests, but they don’t tell you why or when. There’s one more best-practice that’s a general guiding principle for testing:

Tests are guardrails to help developers add value over time, not straight-jackets to contain them.

Time invested in writing tests is also time not invested on something else; like feature code, every line of test code you write needs to be maintained by another engineer. I always ask myself these questions before writing a test:

  1. Am I testing the code as frozen in time, or testing the functionality that lets underlying code evolve?
  2. Am I testing my functionality, or the language constructs themselves?
  3. Is the cost of writing and maintaining this test more than the cost of the functionality breaking?

Most times, you’ll still go ahead and write that test because testing is the right decision in most cases. In some cases though, you might just decide that writing a test – even with all the best-practices you’ve learned – is not the right decision.

Conclusion

To recap, we saw 5 pytest best-practices we’ve discovered at NerdWallet that have helped us up our testing game:

  1. Prefer mocker over mock
  2. Parametrize the same behavior, have different tests for different behaviors
  3. Don’t modify fixture values in other fixtures
  4. Prefer responses over mocking outbound HTTP requests
  5. Prefer tmpdir over global test artifacts

Hopefully, these best-practices help you navigate pytest’s many wonderful features better, and help you write better tests with less effort. If you’d like to join us to build (and test!) some cool new platforms that help provide clarity for all of life’s financial decisions, check out our careers page!

 

.