Handling flaky pytest tests

There’s often a deadline sitting between pragmatism and perfection in a code base and during an exploration of Pythons pytest extensions and plugins I found a couple of exemplary examples of straddling that line. The two modules Flaky, and the more subtly named, pytest-rerunfailures each help blur the lines a little by allowing you to rerun failing tests and often take the “two out of three approach” to handling troublesome tests.

Marking your tests as non-deterministic is simple in both implementations:

# Flaky 
@flaky  # By default retry a failing test once
## rerun the tests upto 3 times and pass if more than `min_passes` succeed
@flaky(max_runs=3, min_passes=2)

# pytest-rerunfailures
## rerun the tests upto 5 times if they fail
@pytest.mark.flaky(reruns=5)

Despite the similar problem domain there are a number of differences between the two plugins that will help you decide which one is more suited to your needs, reading both READMEs before picking is heartily recommended. Flaky for example lets you “best X out of Y” your tests and has a class wide marker for those more problematic areas. pytest- rerunfailures exposes a large part of its functionality via command line options to pytest and can help you explore and get the tests passing at least once so you know where to focus your time and effort rather than just assuming it’s all broken. Given the right kind of code base I could imagine myself running an overnight:

$ pytest --reruns 10 --reruns-delay 300

To check if the tests every actually pass or if there is some non- deterministic behaviour at play.

I’m not sure if I’d ever be willing to use these in earnest in a module but I can see the potential in certain network related domains. Especially when you don’t have the time to pivot to something more stable like a stub or mock. I can also envision how I’d use them when adopting a new (to me) code base with low numbers of test, or just flaky ones. And then never have the time to return and clean them back up.