The importance of testing behaviors Michał Moroz

Software tests come with us a long way. The earliest document I'm aware of is a 1968 NATO report on software engineering – and you can look it up yourself here. So it might seem that everyone knows how to write tests. But yet, time and time again, we see tests that either duplicate the code they're testing or are testing some intricacies of the code that don't really matter to the end user. So what I'm going to share with you today is not new, but it's important. Today we'll be talking about testing behaviors expected from the code being tested.

A couple of years ago I gave a talk in Makimo about this topic. If you are a fan of video content, you can watch it below. Please note that the rest of the article is based on a bit newer version of the project than the one presented in the video.

What is whose-name?

The whose-name project is a simple identity mapping microservice exposing a REST API with a single domain endpoint. Because it's such a simple service, it can be a great teaching example on a couple of topics from the Domain Driven Design (DDD) approach to software design.

To understand the purpose of the service, let's look at the relevant part of whose-name's README.

Given a file of the following structure:

-
  slack: U123456
  jira: test@example.org
-
  slack: U234567
  jira: other@example.org

This service answers questions of the following form:

For one that calls themselves test@example.org on jira, what is their username on Slack? (Answer: U123456).

The domain code is a just a couple of classes implementing a single interface. If you're thinking in CQRS terms, there are no commands, just a single query described above.

It might seem that there's not much to test and even less to learn from such a simple service. But I beg to differ. It's exactly these kinds of services where we can go an extra mile and properly work on its design before jumping into larger projects.

Here, I set upon one convention that I would love to see in more projects, and like to share with you today. The convention is also written in the project's README, namely:

Tests communicate the purpose of the code.

What does it mean for the tests to communicate purpose? Let's look at them together, shall we?

Good tests speak to the reader

If you run the tests on Linux by typing ./vendor/bin/sail test, you'll see something like this:

 PASS  Tests\Domain\AskQueryServiceForIdentities
 [usage] Querying an existing user identity for a known service (e.g. GMail) returns username on that service
 [edge] Querying an Identity for a not known service returns a null value
 [edge] Querying a non-existent Identity for any service returns a null value
 [edge] Querying an existing Identity for the same known service returns the same username as provided

 PASS  Tests\Infrastructure\YamlFileRepositoryHandlesYamlFiles
 [usage] Given a service name (e.g. GMail) and a username, a matching Identity can be found
 [edge] If there's no matching service/username, an empty Identity is returned
 [behavior] Loaded Yaml file persists in the cache
 [behavior] Modifying Yaml file updates the cache
 [edge] If the file does not exist, a query throws an exception

 PASS  Tests\Application\A1WhoseNameAPI\QueryWhoseNameEndpoint
 [usage] Querying the WhoseName API returns requested usernames with ('U123456', 'slack', 'jira', 'test@example.org')
 [usage] Querying the WhoseName API returns requested usernames with ('U234567', 'slack', 'jira', 'other@example.org')
 [usage] Querying the WhoseName API returns requested usernames with ('test@example.org', 'jira', 'slack', 'U123456')
 [usage] Querying the WhoseName API returns requested usernames with ('other@example.org', 'jira', 'slack', 'U234567')
 [edge] Querying the WhoseName API with not known data returns a null value

 PASS  Tests\Application\A2AuthenticationAPI\RequestAccessToken
 [usage] A valid email, password and title yields an access token (as `token` in response JSON)
 [usage] A whose-name ability on a token allows access to whose-name service
 [edge] To request a token, email, password and title must be present
 [edge] If e-mail and password does not match database, a 401 error is returned

Tests:  18 passed
Time:   0.64s

There's a lot of information in this output, but there's a pattern there. The tests cases are grouped in two ways:

The first group is easy to understand. The second one is a bit more interesting. Let's re-run the tests in order to see how the output reads now. Now we'll be running the ./vendor/bin/sail test --group=usage command first.

Usage tests

 PASS  Tests\Domain\AskQueryServiceForIdentities
 [usage] Querying an existing user identity for a known service (e.g. GMail) returns username on that service

 PASS  Tests\Infrastructure\YamlFileRepositoryHandlesYamlFiles
 [usage] Given a service name (e.g. GMail) and a username, a matching Identity can be found

 PASS  Tests\Application\A1WhoseNameAPI\QueryWhoseNameEndpoint
 [usage] Querying the WhoseName API returns requested usernames with ('U123456', 'slack', 'jira', 'test@example.org')
 [usage] Querying the WhoseName API returns requested usernames with ('U234567', 'slack', 'jira', 'other@example.org')
 [usage] Querying the WhoseName API returns requested usernames with ('test@example.org', 'jira', 'slack', 'U123456')
 [usage] Querying the WhoseName API returns requested usernames with ('other@example.org', 'jira', 'slack', 'U234567')

 PASS  Tests\Application\A2AuthenticationAPI\RequestAccessToken
 [usage] A valid email, password and title yields an access token (as `token` in response JSON)
 [usage] A whose-name ability on a token allows access to whose-name service

Tests:  8 passed
Time:   0.37s

These tests tell a story. They show us, what is possible to do with the code we're testing. They tell us what the code is supposed to do. By reading the output, we can deduce the following:

Now, let's think a little. See how humanly readable these tests are? They tell us what the code is supposed to do. Even better, their order is not random, but deliberately chosen in order to provide a good learning experience for the reader.

There is one more thing about usage tests. If these break, the application will stop working, so these are the most important for regression testing. If anything else breaks, we can still use the application, but these ones must always pass.

Edge case tests

Now, let's run the tests again, but this time with the --group=edge flag.

 PASS  Tests\Domain\AskQueryServiceForIdentities
 [edge] Querying an Identity for a not known service returns a null value
 [edge] Querying a non-existent Identity for any service returns a null value
 [edge] Querying an existing Identity for the same known service returns the same username as provided

 PASS  Tests\Infrastructure\YamlFileRepositoryHandlesYamlFiles
 [edge] If there's no matching service/username, an empty Identity is returned
 [edge] If the file does not exist, a query throws an exception

 PASS  Tests\Application\A1WhoseNameAPI\QueryWhoseNameEndpoint
 [edge] Querying the WhoseName API with not known data returns a null value

 PASS  Tests\Application\A2AuthenticationAPI\RequestAccessToken
 [edge] To request a token, email, password and title must be present
 [edge] If e-mail and password does not match database, a 401 error is returned

Tests:  8 passed
Time:   0.50s

Now given that we understood the context the [usage] tests gave us, now we can read into the second important category – edge cases that explain how the code behaves in unusual situations. These tests are useful if we would write the code interacting with this service.

From this section we can learn that:

That gives us most of the information we'd need to write a client of this service. The only thing missing are the URLs and some knowledge about the data formats reqired.

There's one more thing there, too. The edge cases are often a result of a design decision where you're not strongly tied to a specific behavior, but you just settled on one of a few options. This means, that until you gain a lot of clients and users, you can change your mind and change the behavior, which is much more relaxed than the behavior described by the [usage] tests.

That becomes even more fluid if you have just one or seldom clients and users. You can change the behavior of the code without breaking anything, and sometimes, nobody will even notice.

For me that often means that I shouldn't arrive too early at deciding how the edge cases should be handled. I'd rather wait until further down the line in the development process and add a test when I'm sure how the code should behave – or what do the conventions say about the excpectations of the specific part of the application.

Usage + Edge cases = Behaviors

The two groups, [usage] and [edge] can be easlily found in most projects and their names do not need to be disputed much. Both are testing for the behavior of the implemented service, AND NOT for the implementation itself. See that we're approaching the code from the outside – never ever I've given you a single line of code from the implementation, and that's the point of it.

This clearly goes in mind with the Open/Closed Principle, which states that software entities should be open for extension, but closed for modification. Which works good for libraries, services, compilers and other code we'd like to have a client-service relationship with.

Not always we'd like to do that, but that's a tale for a different article.

So why is it that I've created a third category of tests, named [behavior]?

Other behaviors we'd like to test

Often there are more complicated rules we'd like to depend on in our application. Examples include state logic, timing rules, caching, etc.

These are the expected behaviors we'd like to keep a close eye on. That's why it's useful to write them down as tests, so that we can both verify them and show the behavior to others.

When writing this article, I first put there the term beneficial behaviors, and I believe most of the behavior test should test for the other benefits of the code. But there might be some caveats, integration issues or other reasons to put tests also for the negative cases.

Implementing behavior tests for whose-name's YamlFileRepository

I won't leave you without any code. But before that, we'll just read once more the test output for the [behavior] group:

 PASS  Tests\Infrastructure\YamlFileRepositoryHandlesYamlFiles
 [behavior] Loaded Yaml file persists in the cache
 [behavior] Modifying Yaml file updates the cache

Tests:  2 passed
Time:   0.11s

The additional benefit is that the YamlFileRepository caches the output, so it's not necessary to load the .yml file from the disk every time. By default, Laravel stores the cache as PHP serialized files in a separate directory, which is a bit faster than parsing the sourcefile every time.

However, there are two situations we'd need to test:

And lo, it was time to think about what approach would be the best to test these behaviors.

My first thoughts would be to mock Laravel's cache or check it directly at times. Knowing the implementation details of the repository class, I'd know that there's a specific cache key being used, so I could modify the cache in order to see if it's overwritten or not, or mock the whole class and test whether a method was called.

But this really binds my options down. Removes my freedom to change the caching logic. Binds a key that should not be known to the outside world to the test. And shows all the wrong uses of the code, the ones that we should avoid, not promote.

Then I thought about modifying the YamlFileRepository class to return a boolean value whether the file was loaded or taken from the cache. That was also wrong, as it led to leaking the implementation details to the outside world, and forcing the code to change in order to make the tests pass.

In the end, I took the long road, and I wrote tests that would remove the cache from the picture – and check actual behavior of the code. But what was the actual behavior of caching?

That the file is not read when it's changed. Only when it's modified. Can we even test for that?

In some filesystems, there's something called access time, which is the time when the file was last accessed. We could use that to test if the file was read or not. And this is what I did.

test('Loaded Yaml file persists in the cache', function () {
    Cache::flush();

    $copiedFile = __DIR__ . '/../whosename.ignored.yml';

    // Arrange: Copy the file and set its modified and access time in the past
    copy($this->file, $copiedFile);

    $lastSecond = time() - 1;

    touch($copiedFile, $lastSecond, $lastSecond);

    clearstatcache();

    // Assert the access time and modification time was set
    // It hypothetically could fail on some strange file systems.
    expect(filemtime($copiedFile))
        ->toEqual(fileatime($copiedFile))
        ->toEqual($lastSecond);

    // Act: With cache emptied, first access will read the file
    $repo = new YamlFileRepository($copiedFile);
    $repo->findByServiceAndUsername('slack', 'U123456');

    clearstatcache();

    // Assert: the file was accessed so the times don't match anymore
    expect(filemtime($copiedFile))
        ->toBeLessThan(fileatime($copiedFile));

    // Arrange: set times on the file in the past
    touch($copiedFile, $lastSecond, $lastSecond);

    // Act: With cache set by previous repo call
    // second access doesn't read the file
    $anotherRepo = new YamlFileRepository($copiedFile);
    $anotherRepo->findByServiceAndUsername('slack', 'U123456');

    clearstatcache();

    // Assert: The atime did not change, because
    // the file was not read the second time
    expect(filemtime($copiedFile))
        ->toEqual(fileatime($copiedFile));
})->group('behavior');

It does the following:

This a bit of a longer code, but it doesn't resort to any implementation details and leaves our options open in case we'd want to do anything to our caching logic (for example, stop using Laravel's cache).

In reality, testing behaviors can be hard

In the beginning of this article I've mentioned that these tests run on Linux machine would result in all tests passing. However, I'm also working on a Macbook machine – and the tests failed there.

You see, even if the code above is a good example of a behavior test, it's not always reliable. The reason is that the access time is not always present on each filesystem and it might work in a different way on different operating systems.

It just happens so, that as a speed-up, APFS (Apple's filesystem) sets access time only on some file operations. Which brings us to another modification to the above test:

test('Loaded Yaml file persists in the cache', function () {
    // ...
})->group('behavior')->skip(function() {
    $copiedFile = __DIR__ . '/../whosename.ignored.yml';

    copy($this->file, $copiedFile);

    return !hasFileAtimeChangedOnRead($copiedFile);
}, 'Skipped; filesystem does not support updating access time on read.');

/**
 * Check if file's access time (atime) changes on successful read.
 *
 * @param string $filePath
 * @return bool
 */
function hasFileAtimeChangedOnRead($filePath)
{
    touch($filePath, time() - 1, time() - 1);

    // Get the initial access time
    $initialAtime = fileatime($filePath);

    // Read the file to trigger an access
    file_get_contents($filePath);

    // Clear the file status cache to get the updated atime
    clearstatcache();

    // Get the new access time
    $newAtime = fileatime($filePath);

    // Return true if the access time has changed, false otherwise
    return $initialAtime !== $newAtime;
}

We're checking if we can safely test for the behavior. If not – this is not the problem with the code per se, so we skip it instead of failing the test.

This code still works on Linux, and on MacOS, you'll get that output:

 WARN  Tests\Infrastructure\YamlFileRepositoryHandlesYamlFiles
- [behavior] Loaded Yaml file persists in the cache → Skipped; filesystem does not support updating access time on read.
 [behavior] Modifying Yaml file updates the cache

Tests:  1 skipped, 1 passed
Time:   0.11s

In the end, it's really a balancing act between wanting to test actual behaviors and the facilities we have at our disposal in order to do so. In my case, I did write most of the code on the Linux machine and wasn't aware of the test failing on MacOS until much further. If I knew about the issue from the beginning, that might change the way I'd write the tests. But now, I'm still happy with the result, especially that any CI pipeline would most likely be Linux-based and if I were to implement it, it still should be able to run the whole test suite.

So, how would I go about it, precisely?

I will leave you with a few rules of thumb to follow when writing tests.

Document usage

Become a client of your own code for a while, forget about the implementation and think about how you'd use it to accomplish your goals. Then write tests for that.

Document full usage paths

Try to cover a real use case in a single test. When testing a booking system, try to actually book the reservation with the tools you have instead of testing that the change_booking_status_to_booked function changes the status to booked.

Document edge cases

Think about the code from the outside, not from the inside. What would happen if we'd provide a wrong username? What if we'd provide a wrong service? What if we'd provide a wrong email? Note that these design decisions are not as important as the ones we've made in the [usage] section, and you still have some leeway to change them.

Freeze as small a protocol as possible

If you're writing a service or a library, you're not writing a specification. You're writing a protocol that will be used by other developers. That means every additional test removes your freedom to change the protocol. Thus, you should think what is the smallest set of tests that would cover the most important use cases.

Show off your English skills

Instead of dumb test_if_x_works tests, treat this as a story, as prose. And you're the writer of that story. Will you make it fascinating to others while still providing an automated way to verify it? In a sense, it's a lot better than writing a technical documentation or code comments. It's a proven story.

Arrange your tests in a way that makes sense

Put the most pieces of information up top, describe details in the rest. Make the tests flow with each test and each sentence.

Come back later

When your application stabilizes, come back to the tests and see if you can refine them. Improve their legibility. Show actual code examples someone could paste into their project. If you're doing it right, you can point then someone else to the tests and say: "This is how you can use this service."

Conclusion

Writing tests is a skill that can be both learned and appreciated. If often doesn't. We often don't have the time to write tests, or to serve them enough justice. But if we do, we'll be able to write better code, faster, with less bugs.

If you're just learning programming, I hope that now you understand how tests can communicate the purpose of the code, and that it's a precious lesson for you. If you're already a seasoned developer, show that to your younger colleagues and thank them sometimes if their tests are well-written and interesting to read.

I'll be writing a bit more about the whose-name project in the future, so if you're interested how much one can learn from a simple, rock-solid project, come back later or follow me on social media.

If you want to read whose-name's tests, you can find them here.

Thank you for reading!