Thoughts on Testing Code

I was watching a recent episode of The Rest Is Science about how humans find abstract reasoning inherently difficult, and it got me wondering about how we think about code quality and the link between coding and the contracts we make when defining expected behaviour of code.

For reference, the episode (which you should go watch!) introduces a reasoning test, the Wason selection task which reads as the following

You are shown a set of four cards placed on a table, each of which has a letter on one side and a number on the other. The visible faces of the cards show 7, 8, G and A. Which card(s) must you turn over in order to test that if a card shows A on the letter face, then its opposite number face is a 7?

┌───┐  ┌───┐  ┌───┐  ┌───┐
│ 7 │  │ 8 │  │ G │  │ A │
└───┘  └───┘  └───┘  └───┘

If you’ve not encountered this test before, have a think. Which card(s) would you turn?

Most people will immediately know that they should turn over the A, as it’s written there in the rule! If we turn it over and it’s a seven then the test has passed, excellent. Some people will also turn over the 7, but what will this tell us? The rule doesn’t specify that 7s need to have As! Assuming that the other side of a 7 must be an A is the “Affirming the consequent” logical fallacy.

The original author suggests that confirmation bias plays a part in people’s predisposition for this incorrect pair of answers. We’re programmed to search for cases that affirm the rule, rather than challenge it. The correct answer is that we need to look for ✨counterfactuals✨, in this case by turning the A (as anything but 7 invalidates the rule) and the 8 (as an A would invalidate the rule).

Let’s take this same predisposition and apply it to a simply unit-test-writing scenario in Python.

Given the very simple (until you think about it) function

def is_vowel(char: str) -> bool:
    """Return `True` if char is a vowel.
    """
    ...

What unit tests should we be thinking about? Let’s ignore type-safety here (assume we’ve configured a type checker to yell at us separately to tests).

The obvious Link to heading

Perhaps if we don’t think too hard, our initial logical conditional is

If char is a vowel, is_vowel(char) will return True

I don’t know about you, but the first case that pops into my head is something along the lines of:

@pytest.mark.parametrize('expected_vowel', 'aeiou')
def test_is_a_vowel(expected_vowel: str):
    assert is_vowel(expected_vowel)

This is a fair start - we’ve locked in the bare minimum of functionality, but let’s see if we can expand this case a bit further.

Rounding out our vowels Link to heading

Capital letters are an immediate expansion, which we would expect to be easy. What about accented letters? We need to check accents like ä and less-obvious vowels like æ alongside all the extra Unicode magic like U+FF41 - ａ, U+1D68A - 𝚊, and so on. This all forms part of the contract we are defining in is_vowel.

Tests along these lines are similar to turning over the A above - we’ve identified a case which should return True, so turning the card (running the test) must give us the expected response.

As an aside - we’re also starting to highlight the messiness around what a “vowel” actually is! In fact, according to dictionary.com, sometimes “w” and “y” count as vowels. Sometimes. Let’s just quietly ignore that in this example (sorry, linguists).

A Consonant Concern Link to heading

Note that in our original conditional we haven’t specified anything about what is_vowel returns for a consonant! Currently, the function could well return True given a B character as input, without breaking the rules (equivalent to turning over the 7 in the original game). Is this the behaviour we anticipated when we wrote the function?

Let’s add a second conditional to expand the first.

If char is a vowel, is_vowel(char) will return True.
ELSE is_vowel(char) will return False.

We can similarly add a test for not vowels

@pytest.mark.parametrize('not_vowel', 'bcdfghjklmnpqrstvwxyz!?.,')
def test_is_not_a_vowel(not_vowel: str):
    assert not is_vowel(not_vowel)

Is this correct now? Thinking about what “correct” means for a given use case is an important part of API design, and we should bear in mind things like the “Principle of least astonishment” as we create these clauses.

Perhaps an example of where the above falls over is where we provide an empty string to the function, calling is_vowel("") and receiving False is in-line with our contract as specified, but would a calling user expect this, or would it be more intuitive for an error to be raised? If so, we need to amend our contract

IF char is a single vowel, is_vowel(char) will return True.
IF char is a single consonant, is_vowel(char) will return False.
ELSE is_vowel(char) will raise a ValueError.

We can go further, but hopefully you get my point.

I’m not saying you need to explicitly write this kind of statement down for every function you create, but I do believe that it’s important to think hard about the contract we are creating with consumers of our code. Plus, once we have a concrete understanding of this contract, we can try to break it more intelligently.

Coming to Counterfactuals Link to heading

Coming back to our test-writing, how might we “turn over the 8” in a unit test sense? This means checking that, for every input item which does not produce a given output (the consequent of our claim is not the case - e.g. any chars that does not return True), the corresponding antecedent is also not the case (chars is not a vowel).

We need this kind of test to investigate the counterfactual cases in our code. Think about

Given we know what the space of all possible inputs is (any valid Python string), we can do a brute-force-search with some generating function, and then perform a test of the returned value against the appropriate arm of our contract

def gen_valid_str() -> str:
    ...

def any_true_is_a_vowel():
    """Test that for any True response, the input was indeed a vowel.
    """
    for _ in range(N_SAMPLES):
        test_case = gen_valid_str()
        try:
            result = is_vowel(test_case)
            # If the consequent is not the case...
            if not result:
                # ...the antecedent must also not be the case
                # Some checking function goes here that is more
                # specific than `is_vowel...`
                assert test_case_is_not_a_vowel(test_case)
        except ValueError:
            pass

A familiar face appears Link to heading

Huh, this generation then validation looks a bit like Property-based testing!

To me, working through the above example highlights why property based testing is such a valuable tool for improving code correctness. Our human brains find it really difficult to identify the counterfactuals we need to test the “denying the consequent” rule of our code contracts. Maybe if you’re in the tiny minority who immediately solved the card game you’re blessed with an innate ability to dig in to these issues, but I suspect a lot of codebases would benefit from a sprinkling of a property-based testing library like Hegel (which I’ve been enjoying playing with) to guard against our monkey-brains.

It’s only logical.