Avoiding Flaky Tests

Flaky tests are no stranger to the Mattermost code base. Find out what we've done to avoid flaky tests.

September 29, 2019 1418 words

“Your tests are failing.”

This comment on a pull request is crushing. After all the effort you’ve spent writing your code, adding new unit tests, and rebasing your changes to make each commit almost a work of art: how could your tests possibly be failing?

Begrudgingly, you sign into your CI server only to realize the failing tests have nothing to do with your code. You think, “Hmm, maybe the code is more coupled than I expected.”

You run the unit tests locally. PASS. You check out origin/master and run the tests again. PASS.

Flipping back to your CI server, your eye is drawn to the “Rebuild” icon on your CI pipeline. You think to yourself, “Maybe it’s just a flaky test…”

Flaky tests are no stranger to the Mattermost code base. With 6000+ server-side tests, 3500+ client-side tests, and a growing number of Cypress integration tests, we’ve seen our fair share of flaky tests. A few sprints back, we decided to focus some efforts on squashing several of the server-side flaky tests. What follows are some practical observations to anyone reading, writing or running similar unit tests.

Strategies to avoid flaky tests

1) Don’t rebuild until you’ve filed a ticket

With developers often on a deadline, it’s not always realistic to context switch and root cause a flaky test. But don’t waste the opportunity of having observed a flaky test: file a ticket with as much detail as possible. Include the full build log and any artifacts generated by the CI server. If someone else already filed a ticket, add your details to help raise the investigation priority and help the poor soul tasked with debugging it.

2) Avoid using `time.Sleep` to wait for asynchronous results

Consider the following simplified test based on a reported flaky test:

// TestHTTPTimeout verifies that the HTTP service is configured with the requested timeout.
func TestHTTPTimeout(t *testing.T) {
    expectedTimeout := 100 * time.Millisecond

    server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        time.Sleep(101 * time.Millisecond)
    })
    defer server.Close()

    httpService := NewHttpService(HttpServiceOptions{RequestTimeout: expectedTimeout})
    err := httpService.Do()
    require.EqualError(t, err, "timeout")
}

By waiting for 101 milliseconds – 1 millisecond longer than the configured timeout – the test hoped to verify the timeout handling behaviour of the http service.

Unfortunately, the underlying service implementation effectively deferred to http.Client, which was itself relying on a time.Timer. The corresponding documentation for time.Timer makes it clear – with added emphasis:

NewTimer creates a new Timer that will send the current time on its channel after at least duration d.

That is, this particular timeout is guaranteed to wait at least 100 milliseconds, but makes no promise how soon thereafter it will fire. Simply put, waiting just 101 milliseconds sometimes isn’t enough.

The best solution is to avoid using time.Sleep altogether in this case. We ended up rewriting the flaky test as something like:

func TestHTTPTimeout(t *testing.T) {
    expectedTimeout := 100 * time.Millisecond

    releaseHandler := make(chan interface{})
    server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        // Don't actually handle the response, allowing the service to timeout.
        <-releaseHandler
    })
    defer server.Close()
    defer close(releaseHandler)

    httpService := NewHttpService(HttpServiceOptions{RequestTimeout: expectedTimeout})
    err := httpService.Do()
    require.EqualError(t, err, "timeout")
}

Now the handler is guaranteed never to return until the test closes the channel, which occurs only after a timeout has been observed.

3) Do use `time.Sleep` to advance the clock

Don’t underestimate how fast a computer can execute code. Consider the following test:

func (s *Store) CreateUserRecord(user *model.User) {
    // Keep track of the creation time in milliseconds since epoch.
    user.CreateAt = time.Now().UnixNano() / int64(time.Millisecond)

    s.db.Insert(user)
}

func TestTimeDelta(t *testing.T) {
    store := NewStore()

    user1, err := store.CreateUserRecord(&model.User{Username: "test1"})
    require.NoError(t, user)

    user2, err := store.CreateUserRecord(&model.User{Username: "test2"})
    require.NoError(t, user)

    require.NotEqual(t, user1.CreateAt, user2.CreateAt)
}

The test here is rightly trying to verify that CreateAt is actually getting set. Implicitly, it assumes that at least a millisecond happens between the calls to create a user record. Yet even with the cost of a database write, it’s still sometimes possible to create two user records in the same millisecond.

The simplest solution is to make the assumption explicit and safely rely on time.Sleep to advance the clock:

func TestTimeDelta(t *testing.T) {
    store := NewStore()

    user1, err := store.CreateUserRecord(&model.User{Username: "test1"})
    require.NoError(t, user)

    // Ensure the next user has a different CreateAt timestamp.
    time.Sleep(1 * time.Millisecond)

    user2, err := store.CreateUserRecord(&model.User{Username: "test2"})
    require.NoError(t, user)

    require.NotEqual(t, user1.CreateAt, user2.CreateAt)
}

4) MySQL is surprising

So much could be written about oddities within MySQL. One issue we encountered turned out to have the same underlying cause as above, but manifested in a new way:

s1, err := ss.Scheme().Save(s1)
require.Nil(t, err)
s1, err = ss.Scheme().Save(s1)
require.Nil(t, err)
s2, err = ss.Scheme().Save(s2)
require.Nil(t, err)

Note the duplicate call to ss.Scheme().Save(s1). The implementation of Save would effectively perform an upsert if it was asked to save an object already having a unique Id. Thus, the second call would run through the following block:

    // Keep track of the update time in milliseconds since epoch.
    scheme.UpdateAt = time.Now().UnixNano() / int64(time.Millisecond)

	rowsChanged, err := s.GetMaster().Update(scheme)
	if err != nil {
		return nil, model.NewAppError("SqlSchemeStore.Save", "store.sql_scheme.save.update.app_error", nil, err.Error(), http.StatusInternalServerError)
	}
	if rowsChanged != 1 {
		return nil, model.NewAppError("SqlSchemeStore.Save", "store.sql_scheme.save.update.app_error", nil, "no record to update", http.StatusInternalServerError)
	}

As the fix discovered, MySQL will only return rowsChanged == 1 if the row it found actually changes, not just if it found a row matching the constraints. The test would thus fail with an unexpected error whenever two consecutive Save operations occurred in the same millisecond.

This actually deserved a more holistic fix at the store layer to avoid tripping over MySQL’s unique rowsChanged semantics, but in practice this kind of event was considered unlikely in a production environment.

5) Keep in mind that random sometimes doesn’t seem like it

In addition to a number of length and character checks, the Mattermost server rejects certain prefixes to avoid team names like signup or LOGIN21.

When unit testing, we often rely on model.NewId to generate random, UUID version 4 Guid values for testing:

	t1 := model.Team{}
	t1.DisplayName = "Name"
	t1.Name = model.NewId()
	t1.Email = MakeEmail()
	t1.Type = model.TEAM_OPEN
	_, err := ss.Team().Save(&t1)
	require.Nil(t, err)

Every so often, a test would fail claiming that the team name was invalid. It turned out that, from to time, some of those reserved prefixes would be randomly selected for the start of a model.NewId, e.g. APICE0E9143C.... Random? Yes! Invalid? Yes! Expected? No.

The fix was simple, and replicated the pattern used in most other tests to prefix the team name:

	t1 := model.Team{}
	t1.DisplayName = "Name"
	t1.Name = "zz" + model.NewId()
	t1.Email = MakeEmail()
	t1.Type = model.TEAM_OPEN
	_, err := ss.Team().Save(&t1)
	require.Nil(t, err)

6) Avoid remote dependencies.

We vendor all Go dependencies to ensure a reliable build. This strategy also greatly speeds up git bisect when your Go cache doesn’t already have dependencies from an older branch.

Unfortunately, for some plugin tests, our compilation strategy involved using a temporary directory outside of our vendor scope. This went undiscovered until some upstream resource couldn’t be fetched and the CI build failed accordingly. This became especially problematic when git.apache.org went down and took many Go build pipelines along with it.

We’ve since fixed the compilation issue for plugin tests, and even applied this strategy to other flaky tests to avoid connecting to remote dependencies outside of our control.

7) Don’t assume the test is actually flaky!

One danger of having flaky tests is conditioning developers to believe that any unexpected test failure is likely to be flaky. As an example, we had received multiple reports of a flaky unit test concerning the FileWillBeUploaded plugin hook. Much effort was spent statically analyzing the code and attempting to reproduce locally. Even with the test repeated 100x, it always succeeded locally. Given the ongoing impact to developers with unrelated changes, we were contemplating disabling the test altogether.

Fastforward a few weeks, and a community member reported a mysterious issue: sometimes files processed by a plugin using FileWillBeUploaded wouldn’t successfully update. One of our newest staff members took a look, and immediately spotted a race condition in the related code. The resulting fix addressed both the community member’s issue and our longstanding “flaky” test.

Conclusion

To summarize from our own observations:

File a ticket with as much detail as possible!
Don’t use time.Sleep? Do use time.Sleep? Know when to use time.Sleep responsibly.
Know your database semantics.
Expect anything from the unexpected and plan accordingly.
Assume anything outside of your control will eventually fail.
Every flaky test is a bug. Don’t assume it’s always a bug in the test.

Written by Jesse Hallam - @jesse.hallam on community.mattermost.com and @lieut-data on GitHub

Join us on community.mattermost.com!

Avoiding Flaky Tests

Strategies to avoid flaky tests

1) Don’t rebuild until you’ve filed a ticket

2) Avoid using `time.Sleep` to wait for asynchronous results

3) Do use `time.Sleep` to advance the clock

4) MySQL is surprising

5) Keep in mind that random sometimes doesn’t seem like it

6) Avoid remote dependencies.

7) Don’t assume the test is actually flaky!

Conclusion

Other Posts

Categories

Avoiding Flaky Tests

Strategies to avoid flaky tests

1) Don’t rebuild until you’ve filed a ticket

2) Avoid using time.Sleep to wait for asynchronous results

3) Do use time.Sleep to advance the clock

4) MySQL is surprising

5) Keep in mind that random sometimes doesn’t seem like it

6) Avoid remote dependencies.

7) Don’t assume the test is actually flaky!

Conclusion

Other Posts

Categories

2) Avoid using `time.Sleep` to wait for asynchronous results

3) Do use `time.Sleep` to advance the clock